From e619ffa000b727d7315d4db1730475c778194391 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Wed, 6 Dec 2023 16:51:07 -0800 Subject: [PATCH 01/56] Update sl-jdbc.md We're making some changes to our API (not released yet), so I'm preemptively adding info. --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index aba309566f8..b250c9a7818 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -254,11 +254,14 @@ Where filters in API allow for a filter list or string. We recommend using the f Where Filters have a few objects that you can use: -- `Dimension()` - Used for any categorical or time dimensions. If used for a time dimension, granularity is required - `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` +- `Dimension()` - Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` - `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` -Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimesion` input takes the time dimension name and also requires granularity, like this: `TimeDimension('metric_time', 'MONTH')`. + +Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimesion` input takes the time dimension name and also requires granularity, like this: `TimeDimension('metric_time', 'month')`. + +For both `TimeDimension()` and `Dimension()` objects, the grain is only required if the aggregation time dimension for the measures / metrics associated with the where filter is not the same for all measures. - Use the following example to query using a `where` filter with the string format: @@ -315,7 +318,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], order_by=[-'order_gross_profit'] }} ``` -If you are ordering by an object that's been operated on (e.g., change granularity), or you are using the full object notation, descending order must look like: +If you are ordering by an object that's been operated on (e.g., you changed the the granularity of the time dimension), or you are using the full object notation, descending order must look like: ```bash select * from {{ From dc49f197ec31317f2b069c64c413eb9f4804274a Mon Sep 17 00:00:00 2001 From: rpourzand Date: Thu, 7 Dec 2023 17:55:44 -0800 Subject: [PATCH 02/56] Update sl-jdbc.md Adding an example from Paul --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 60 ++++++++++++++++++++- 1 file changed, 58 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index b250c9a7818..2a9c5e95199 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -259,9 +259,65 @@ Where Filters have a few objects that you can use: - `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` -Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimesion` input takes the time dimension name and also requires granularity, like this: `TimeDimension('metric_time', 'month')`. +Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimension` input takes the time dimension name and also requires granularity, like this: `TimeDimension('metric_time', 'month')`. + +For both `TimeDimension()` and `Dimension()` objects, the grain is only required if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. + +For example, consider this Semantic Model and Metric Config which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. + +```yaml +--- +semantic_model: + name: my_model_source + +defaults: + agg_time_dimension: created_month + + measures: + - name: measure_0 + agg: sum + - name: measure_1 + agg: sum + agg_time_dimension: order_year + + dimensions: + - name: created_month + type: time + type_params: + time_granularity: month + - name: order_year + type: time + time_granularity: year +... + + +metrics: + name: metric_0 + description: A metric with a month grain. + type: simple + type_params: + measure: measure_0 + + name: metric_1 + description: A metric with a year grain + type: simple + type_params: + measure: measure_1 + +``` + +Assuming the user is querying metric_0 and metric_1 together, a valid filter would be: + + * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` + +Invalid Filters would be: + + `* "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query + are defined based on measures with different grains. + + * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - + metric_1 is not available at a month grain. -For both `TimeDimension()` and `Dimension()` objects, the grain is only required if the aggregation time dimension for the measures / metrics associated with the where filter is not the same for all measures. - Use the following example to query using a `where` filter with the string format: From be5c52c0a68fd2f5e2a9958661e52eefb7640511 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Thu, 7 Dec 2023 17:57:48 -0800 Subject: [PATCH 03/56] Update sl-graphql.md copying the same over for jdbc -- maybe we can figure out a way to put this somewhere so we don't have to duplicate --- .../docs/docs/dbt-cloud-apis/sl-graphql.md | 63 ++++++++++++++++++- 1 file changed, 60 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index b7d13d0d453..0fd55397bc9 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -432,18 +432,18 @@ mutation { The `where` filter takes a list argument (or a string for a single input). Depending on the object you are filtering, there are a couple of parameters: - - `Dimension()` — Used for any categorical or time dimensions. If used for a time dimension, granularity is required. For example, `Dimension('metric_time').grain('week')` or `Dimension('customer__country')`. + - `Dimension()` — Used for any categorical or time dimensions. For example, `Dimension('metric_time').grain('week')` or `Dimension('customer__country')`. - `Entity()` — Used for entities like primary and foreign keys, such as `Entity('order_id')`. -Note: If you prefer a more strongly typed `where` clause, you can optionally use `TimeDimension()` to separate out categorical dimensions from time ones. The `TimeDimension` input takes the time dimension name and also requires granularity. For example, `TimeDimension('metric_time', 'MONTH')`. +Note: If you prefer a more strongly typed `where` clause, you can optionally use `TimeDimension()` to separate out categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. ```graphql mutation { createQuery( environmentId: BigInt! metrics:[{name: "order_total"}] - groupBy:[{name: "customer__customer_type"}, {name: "metric_time", grain: MONTH}] + groupBy:[{name: "customer__customer_type"}, {name: "metric_time", grain: month}] where:[{sql: "{{ Dimension('customer__customer_type') }} = 'new'"}, {sql:"{{ Dimension('metric_time').grain('month') }} > '2022-10-01'"}] ) { queryId @@ -451,6 +451,63 @@ mutation { } ``` +For both `TimeDimension()` and `Dimension()` objects, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. + +For example, consider this Semantic Model and Metric Configuration which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. + +```yaml +--- +semantic_model: + name: my_model_source + +defaults: + agg_time_dimension: created_month + + measures: + - name: measure_0 + agg: sum + - name: measure_1 + agg: sum + agg_time_dimension: order_year + + dimensions: + - name: created_month + type: time + type_params: + time_granularity: month + - name: order_year + type: time + time_granularity: year +... + + +metrics: + name: metric_0 + description: A metric with a month grain. + type: simple + type_params: + measure: measure_0 + + name: metric_1 + description: A metric with a year grain + type: simple + type_params: + measure: measure_1 + +``` + +Assuming the user is querying metric_0 and metric_1 together, a valid filter would be: + + * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` + +Invalid Filters would be: + + `* "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query + are defined based on measures with different grains. + + * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - + metric_1 is not available at a month grain. + **Query with Order** ```graphql From 8c510b7c3cf7ca717c2d7a1fcc1337f10ad78726 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Thu, 7 Dec 2023 17:58:45 -0800 Subject: [PATCH 04/56] Update sl-jdbc.md timedimension is no longer optional --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 2a9c5e95199..3c771cc182c 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -259,7 +259,7 @@ Where Filters have a few objects that you can use: - `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` -Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimension` input takes the time dimension name and also requires granularity, like this: `TimeDimension('metric_time', 'month')`. +Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimension` input takes the time dimension name optionally requires granularity, like this: `TimeDimension('metric_time', 'month')`. For both `TimeDimension()` and `Dimension()` objects, the grain is only required if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. From 436577f237fedc60038fb38e7d05ede0d97695d8 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Thu, 7 Dec 2023 18:00:09 -0800 Subject: [PATCH 05/56] Update sl-jdbc.md fixing some updates --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 3c771cc182c..220d4fb2cd4 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -261,7 +261,7 @@ Where Filters have a few objects that you can use: Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimension` input takes the time dimension name optionally requires granularity, like this: `TimeDimension('metric_time', 'month')`. -For both `TimeDimension()` and `Dimension()` objects, the grain is only required if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. +For both `TimeDimension()` and `Dimension()` objects, the grain is only required if in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. For example, consider this Semantic Model and Metric Config which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. @@ -287,10 +287,10 @@ defaults: time_granularity: month - name: order_year type: time - time_granularity: year + type_params: + time_granularity: year ... - metrics: name: metric_0 description: A metric with a month grain. @@ -306,7 +306,7 @@ metrics: ``` -Assuming the user is querying metric_0 and metric_1 together, a valid filter would be: +Assuming the user is querying metric_0 and metric_1 together in a single request, a valid WHERE filter would be: * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` From c250bb6490a1d4646d53907378cc789b9a283567 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Thu, 7 Dec 2023 18:00:59 -0800 Subject: [PATCH 06/56] Update sl-graphql.md fixing typo --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 0fd55397bc9..163380bb647 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -477,7 +477,8 @@ defaults: time_granularity: month - name: order_year type: time - time_granularity: year + type_params: + time_granularity: year ... From 61af03e70e666c1e702dca66428bed253b092f5a Mon Sep 17 00:00:00 2001 From: rpourzand Date: Fri, 8 Dec 2023 14:52:59 -0800 Subject: [PATCH 07/56] Update sl-graphql.md making paul's reco's --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 163380bb647..7b4eeddfee6 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -436,7 +436,7 @@ The `where` filter takes a list argument (or a string for a single input). Depen - `Entity()` — Used for entities like primary and foreign keys, such as `Entity('order_id')`. -Note: If you prefer a more strongly typed `where` clause, you can optionally use `TimeDimension()` to separate out categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. +Note: If you prefer a `where` clause with a more explicit path, you can optionally use `TimeDimension()` to separate out categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. ```graphql mutation { @@ -502,8 +502,8 @@ Assuming the user is querying metric_0 and metric_1 together, a valid filter wou * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` Invalid Filters would be: - - `* "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query + + * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query are defined based on measures with different grains. * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - From c5afac9f933dc2bba6aa7f28c5f782afd2ba1448 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Fri, 8 Dec 2023 14:53:53 -0800 Subject: [PATCH 08/56] Update sl-jdbc.md paul's recos --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 220d4fb2cd4..b1d90721d24 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -312,7 +312,7 @@ Assuming the user is querying metric_0 and metric_1 together in a single request Invalid Filters would be: - `* "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query + * `"{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query are defined based on measures with different grains. * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - From 384eba8856f59a341af767b744dc866673f9d66a Mon Sep 17 00:00:00 2001 From: rpourzand Date: Tue, 19 Dec 2023 10:59:29 -0800 Subject: [PATCH 09/56] Update sl-graphql.md also adding a dimension only example to it --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 3591c36cbd3..774b0c53d87 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -204,6 +204,8 @@ DimensionType = [CATEGORICAL, TIME] ### Querying +When querying for data, _either_ a `groupBy` _or_ a `metrics` selection is required. + **Create Dimension Values query** ```graphql @@ -428,6 +430,19 @@ mutation { } ``` +**Query a categorical dimension on its own** + +```graphql +mutation { + createQuery( + environmentId: 123456 + groupBy: [{name: "customer__customer_type"}] + ) { + queryId + } +} +``` + **Query with a where filter** The `where` filter takes a list argument (or a string for a single input). Depending on the object you are filtering, there are a couple of parameters: From bc2239ac113dcbeff4e97830768475ada91b4c9b Mon Sep 17 00:00:00 2001 From: rpourzand Date: Wed, 20 Dec 2023 17:07:12 -0800 Subject: [PATCH 10/56] Update sl-jdbc.md updating per jordan's input --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 69428878f80..a1f79208e6d 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -256,12 +256,12 @@ Where Filters have a few objects that you can use: - `Dimension()` - Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` -- `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` +- `TimeDimension()` - Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` +- `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` -Note: If you prefer a more explicit path to create the `where` clause, you can optionally use the `TimeDimension` feature. This helps separate out categorical dimensions from time-related ones. The `TimeDimension` input takes the time dimension name optionally requires granularity, like this: `TimeDimension('metric_time', 'month')`. -For both `TimeDimension()` and `Dimension()` objects, the grain is only required if in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. +For `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. For example, consider this Semantic Model and Metric Config which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. From a73e79398bb432a46133c729594eafe8ba3a4743 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:23:45 -0500 Subject: [PATCH 11/56] Update sl-graphql.md format yaml --- .../docs/docs/dbt-cloud-apis/sl-graphql.md | 41 ++++++++----------- 1 file changed, 16 insertions(+), 25 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 774b0c53d87..28ac38936ca 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -471,20 +471,17 @@ For both `TimeDimension()` and `Dimension()` objects, the grain is only required For example, consider this Semantic Model and Metric Configuration which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. ```yaml ---- semantic_model: name: my_model_source - -defaults: - agg_time_dimension: created_month +defaults: + agg_time_dimension: created_month measures: - name: measure_0 agg: sum - name: measure_1 agg: sum agg_time_dimension: order_year - dimensions: - name: created_month type: time @@ -494,35 +491,29 @@ defaults: type: time type_params: time_granularity: year -... - metrics: - name: metric_0 - description: A metric with a month grain. - type: simple - type_params: - measure: measure_0 - - name: metric_1 - description: A metric with a year grain - type: simple - type_params: - measure: measure_1 - + - name: metric_0 + description: A metric with a month grain. + type: simple + type_params: + measure: measure_0 + - name: metric_1 + description: A metric with a year grain. + type: simple + type_params: + measure: measure_1 ``` -Assuming the user is querying metric_0 and metric_1 together, a valid filter would be: +Assuming the user is querying `metric_0` and `metric_1` together, a valid filter would be: * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` -Invalid Filters would be: +Invalid filters would be: - * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query - are defined based on measures with different grains. + * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query b are defined based on measures with different grains. - * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - - metric_1 is not available at a month grain. + * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` — `metric_1` is not available at a month grain. **Query with Order** From ceea7dc2bc35825d1dfd8ed26c2dbf9e616e7bef Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:25:26 -0500 Subject: [PATCH 12/56] Update website/docs/docs/dbt-cloud-apis/sl-graphql.md --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 28ac38936ca..77d50d1d293 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -468,7 +468,7 @@ mutation { For both `TimeDimension()` and `Dimension()` objects, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. -For example, consider this Semantic Model and Metric Configuration which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. +For example, consider this Semantic model and Metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. ```yaml semantic_model: From 5f832e9d6ecb9468aeb615ef66f227203e7360f5 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:25:50 -0500 Subject: [PATCH 13/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index a1f79208e6d..09fb73c1c9f 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -254,7 +254,7 @@ Where filters in API allow for a filter list or string. We recommend using the f Where Filters have a few objects that you can use: -- `Dimension()` - Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` +- `Dimension()` — Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` - `TimeDimension()` - Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` From 1b774190a0e67f9c7e5641eddb687f16bd025e70 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:26:04 -0500 Subject: [PATCH 14/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 09fb73c1c9f..32e57208126 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -256,7 +256,7 @@ Where Filters have a few objects that you can use: - `Dimension()` — Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` -- `TimeDimension()` - Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` +- `TimeDimension()` — Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` - `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` From e06152133bcd3184181cfc371ba05152b45ae626 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:26:29 -0500 Subject: [PATCH 15/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 32e57208126..98fe99fc713 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -258,7 +258,7 @@ Where Filters have a few objects that you can use: - `TimeDimension()` — Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` -- `Entity()` - Used for entities like primary and foreign keys - `Entity('order_id')` +- `Entity()` — Used for entities like primary and foreign keys - `Entity('order_id')` For `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. From 7aa722d1e834b64b74451d79637adeafae1a137d Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:28:37 -0500 Subject: [PATCH 16/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 98fe99fc713..60fd45d2424 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -261,7 +261,7 @@ Where Filters have a few objects that you can use: - `Entity()` — Used for entities like primary and foreign keys - `Entity('order_id')` -For `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. +For `TimeDimension()`, the grain is only required in the `WHERE` filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. For example, consider this Semantic Model and Metric Config which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. From 7dc125e79f33ec8dba66c98534a581eca4791b20 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:34:29 -0500 Subject: [PATCH 17/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 60fd45d2424..b6eb03affde 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -263,7 +263,7 @@ Where Filters have a few objects that you can use: For `TimeDimension()`, the grain is only required in the `WHERE` filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. -For example, consider this Semantic Model and Metric Config which contains two metrics that are aggregated across different time grains. This example shows a single Semantic Model, but the same goes for metrics across more than one Semantic Model. +For example, consider this Semantic model and Metric config, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. ```yaml --- From ece5d3f662660a66ab6d971272d2cc22c12ca966 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 21 Dec 2023 09:39:20 -0500 Subject: [PATCH 18/56] Update sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 70 +++++++++++---------- 1 file changed, 36 insertions(+), 34 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index b6eb03affde..41d57994cc7 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -177,8 +177,6 @@ To query metric values, here are the following parameters that are available: |`order` | Order the data returned by a particular field | `order_by=['order_gross_profit']`, use `-` for descending, or full object notation if the object is operated on: `order_by=[Metric('order_gross_profit').descending(True)`] | Optional | | `compile` | If true, returns generated SQL for the data platform but does not execute | `compile=True` | Optional | - - ## Note on time dimensions and `metric_time` You will notice that in the list of dimensions for all metrics, there is a dimension called `metric_time`. `Metric_time` is a reserved keyword for the measure-specific aggregation time dimensions. For any time-series metric, the `metric_time` keyword should always be available for use in queries. This is a common dimension across *all* metrics in a semantic graph. @@ -266,20 +264,17 @@ For `TimeDimension()`, the grain is only required in the `WHERE` filter if the a For example, consider this Semantic model and Metric config, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. ```yaml ---- semantic_model: name: my_model_source - -defaults: - agg_time_dimension: created_month +defaults: + agg_time_dimension: created_month measures: - name: measure_0 agg: sum - name: measure_1 agg: sum agg_time_dimension: order_year - dimensions: - name: created_month type: time @@ -288,36 +283,31 @@ defaults: - name: order_year type: time type_params: - time_granularity: year -... + time_granularity: year metrics: - name: metric_0 - description: A metric with a month grain. - type: simple - type_params: - measure: measure_0 - - name: metric_1 - description: A metric with a year grain - type: simple - type_params: - measure: measure_1 + - name: metric_0 + description: A metric with a month grain. + type: simple + type_params: + measure: measure_0 + - name: metric_1 + description: A metric with a year grain. + type: simple + type_params: + measure: measure_1 ``` -Assuming the user is querying metric_0 and metric_1 together in a single request, a valid WHERE filter would be: +Assuming the user is querying `metric_0` and `metric_1` together in a single request, a valid `WHERE` filter would be: * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` Invalid Filters would be: - * `"{{ TimeDimension('metric_time') }} > '2020-01-01'"` - metrics in the query - are defined based on measures with different grains. - - * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` - - metric_1 is not available at a month grain. + * `"{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query are defined based on measures with different grains. + * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` — `metric_1` is not available at a month grain. - Use the following example to query using a `where` filter with the string format: @@ -350,7 +340,8 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], group_by=[Dimension('metric_time')], limit=10) }} -``` +``` + ### Query with Order By Examples Order By can take a basic string that's a Dimension, Metric, or Entity and this will default to ascending order @@ -373,7 +364,8 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], limit=10, order_by=[-'order_gross_profit'] }} -``` +``` + If you are ordering by an object that's been operated on (e.g., you changed the the granularity of the time dimension), or you are using the full object notation, descending order must look like: ```bash @@ -413,14 +405,24 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], -- **Why do some dimensions use different syntax, like `metric_time` versus `[Dimension('metric_time')`?**
- When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax `[Dimension('metric_time')` is required. + + +When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax `[Dimension('metric_time')` is required. + + + + + +The double underscore `"__"` syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example, `user__country` means someone is looking at the `country` dimension from the `user` table. + + + -- **What does the double underscore `"__"` syntax in dimensions mean?**
- The double underscore `"__"` syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example, `user__country` means someone is looking at the `country` dimension from the `user` table. + + +The default output follows the format `{time_dimension_name}__{granularity_level}`. So for example, if the time dimension name is `ds` and the granularity level is yearly, the output is `ds__year`. -- **What is the default output when adding granularity?**
- The default output follows the format `{time_dimension_name}__{granularity_level}`. So for example, if the time dimension name is `ds` and the granularity level is yearly, the output is `ds__year`. +
## Related docs From b7cd78a646175341f0632b7a89027f4d89e34491 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 21 Dec 2023 10:05:47 -0500 Subject: [PATCH 19/56] add toggles --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 41d57994cc7..b3d9b961957 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -403,24 +403,24 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], ## FAQs - +
- - -When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. However, when you perform operations on the dimension, such as adding granularity, the object syntax `[Dimension('metric_time')` is required. + +When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. +However, when you perform operations on the dimension, such as adding granularity, the object syntax `[Dimension('metric_time')` is required. - + The double underscore `"__"` syntax indicates a mapping from an entity to a dimension, as well as where the dimension is located. For example, `user__country` means someone is looking at the `country` dimension from the `user` table. - - - -The default output follows the format `{time_dimension_name}__{granularity_level}`. So for example, if the time dimension name is `ds` and the granularity level is yearly, the output is `ds__year`. + +The default output follows the format `{{time_dimension_name}__{granularity_level}}`. + +So for example, if the `time_dimension_name` is `ds` and the granularity level is yearly, the output is `ds__year`. From 7b8cc9ad8da5b62d89a03e1bddb40173fd473d27 Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Fri, 22 Dec 2023 14:24:36 -0500 Subject: [PATCH 20/56] Apply suggestions from code review --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index b3d9b961957..6c57cdec161 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -252,11 +252,11 @@ Where filters in API allow for a filter list or string. We recommend using the f Where Filters have a few objects that you can use: -- `Dimension()` — Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')` +- `Dimension()` — Used for any categorical or time dimensions. `Dimension('metric_time').grain('week')` or `Dimension('customer__country')`. -- `TimeDimension()` — Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')` +- `TimeDimension()` — Used as a more explicit definition for time dimensions, optionally takes in a granularity `TimeDimension('metric_time', 'month')`. -- `Entity()` — Used for entities like primary and foreign keys - `Entity('order_id')` +- `Entity()` — Used for entities like primary and foreign keys - `Entity('order_id')`. For `TimeDimension()`, the grain is only required in the `WHERE` filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. From 7cf66335a501d9f74ba2bddc46e5d577f5d0c807 Mon Sep 17 00:00:00 2001 From: Ly Nguyen Date: Wed, 3 Jan 2024 16:29:20 -0800 Subject: [PATCH 21/56] Update CI jobs --- website/docs/docs/deploy/ci-jobs.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 149a6951fdc..32b8407971e 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -11,12 +11,12 @@ You can set up [continuous integration](/docs/deploy/continuous-integration) (CI dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deployment environment](/docs/deploy/deploy-environments#create-a-deployment-environment) that's connected to a staging database. Having a separate environment dedicated for CI will provide better isolation between your temporary CI schema builds and your production data builds. Additionally, sometimes teams need their CI jobs to be triggered when a PR is made to a branch other than main. If your team maintains a staging branch as part of your release process, having a separate environment will allow you to set a [custom branch](/faqs/environments/custom-branch-settings) and, accordingly, the CI job in that dedicated environment will be triggered only when PRs are made to the specified custom branch. To learn more, refer to [Get started with CI tests](/guides/set-up-ci). + ### Prerequisites - You have a dbt Cloud account. - For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). -- You must be connected using dbt Cloud’s native Git integration with [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or [Azure DevOps](/docs/cloud/git/connect-azure-devops). - - With GitLab, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. - - If you previously configured your dbt project by providing a generic git URL that clones using SSH, you must reconfigure the project to connect through dbt Cloud's native integration. +- If you're using [GitLab](/docs/cloud/git/connect-gitlab), you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. +- If you previously configured your dbt project by providing a generic git URL that clones using SSH, you must reconfigure the project to connect through dbt Cloud's native integration. To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. @@ -63,12 +63,13 @@ If you're not using dbt Cloud’s native Git integration with [GitHub](/docs/cl 1. Set up a CI job with the [Create Job](/dbt-cloud/api-v2#/operations/Create%20Job) API endpoint using `"job_type": ci` or from the [dbt Cloud UI](#set-up-ci-jobs). -1. Call the [Trigger Job Run](/dbt-cloud/api-v2#/operations/Trigger%20Job%20Run) API endpoint to trigger the CI job. You must include these fields to the payload: - - Provide the pull request (PR) ID with one of these fields, even if you're using a different Git provider (like Bitbucket). This can make your code less human-readable but it will _not_ affect dbt functionality. +1. Call the [Trigger Job Run](/dbt-cloud/api-v2#/operations/Trigger%20Job%20Run) API endpoint to trigger the CI job. You must include both of these fields to the payload: + - Provide the pull request (PR) ID using one of these fields: - `github_pull_request_id` - `gitlab_merge_request_id` - - `azure_devops_pull_request_id`  + - `azure_devops_pull_request_id` + - `non_native_pull_request_id` (for example, BitBucket) - Provide the `git_sha` or `git_branch` to target the correct commit or branch to run the job against. ## Example pull requests From 2fef3fcdfcdecc601e82cc127a0fd3cdbc6c53bf Mon Sep 17 00:00:00 2001 From: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> Date: Thu, 4 Jan 2024 17:09:51 -0800 Subject: [PATCH 22/56] Update website/docs/docs/deploy/ci-jobs.md Co-authored-by: Katia Nartovich <56835998+knartovich@users.noreply.github.com> --- website/docs/docs/deploy/ci-jobs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 32b8407971e..77d8bc990fa 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -15,7 +15,7 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy ### Prerequisites - You have a dbt Cloud account. - For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). -- If you're using [GitLab](/docs/cloud/git/connect-gitlab), you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. +- If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. - If you previously configured your dbt project by providing a generic git URL that clones using SSH, you must reconfigure the project to connect through dbt Cloud's native integration. From e5a191e40f1ae93ea85caacfb6612fadefc0faa0 Mon Sep 17 00:00:00 2001 From: Ly Nguyen Date: Mon, 8 Jan 2024 11:01:39 -0800 Subject: [PATCH 23/56] Feedback --- website/docs/docs/deploy/ci-jobs.md | 23 ++++--------------- .../docs/deploy/continuous-integration.md | 1 - 2 files changed, 4 insertions(+), 20 deletions(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 77d8bc990fa..c23ce78bceb 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -14,9 +14,10 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy ### Prerequisites - You have a dbt Cloud account. -- For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). -- If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. -- If you previously configured your dbt project by providing a generic git URL that clones using SSH, you must reconfigure the project to connect through dbt Cloud's native integration. +- Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. +- For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features: + - Your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). + - You are using one of these native Git integrations with dbt Cloud: [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or [Azure DevOps](/docs/cloud/git/connect-azure-devops). If you have a GitLab integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. @@ -111,22 +112,6 @@ If you're experiencing any issues, review some of the common questions and answe -
- Reconnecting your dbt project to use dbt Cloud's native integration with GitHub, GitLab, or Azure DevOps -
-
If your dbt project relies the generic git clone method that clones using SSH and deploy keys to connect to your dbt repo, you need to disconnect your repo and reconnect it using the native GitHub, GitLab, or Azure DevOps integration in order to enable dbt Cloud CI.



- First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



- - Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- -

- Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- -

- - Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job.
-
-
Error messages that refer to schemas from previous PRs
diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index 0f87965aada..90a56d47aea 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -32,7 +32,6 @@ The [dbt Cloud scheduler](/docs/deploy/job-scheduler) executes CI jobs different - **Concurrent CI checks** — CI runs triggered by the same dbt Cloud CI job execute concurrently (in parallel), when appropriate - **Smart cancellation of stale builds** — Automatically cancels stale, in-flight CI runs when there are new commits to the PR -- **Run slot treatment** — CI runs don't consume a run slot ### Concurrent CI checks From ecf9427bd671235ed72abf8f4b17a92bad6e7ff7 Mon Sep 17 00:00:00 2001 From: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> Date: Mon, 8 Jan 2024 11:10:17 -0800 Subject: [PATCH 24/56] Update website/docs/docs/deploy/ci-jobs.md --- website/docs/docs/deploy/ci-jobs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index c23ce78bceb..c46a79f4f86 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -17,7 +17,7 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy - Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. - For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features: - Your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). - - You are using one of these native Git integrations with dbt Cloud: [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or [Azure DevOps](/docs/cloud/git/connect-azure-devops). If you have a GitLab integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. + - You must be using one of these native Git integrations with dbt Cloud: [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or [Azure DevOps](/docs/cloud/git/connect-azure-devops). If you have a GitLab integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. From 3ceb5dfa494c3178fda68914f72d62c48941ac41 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 10 Jan 2024 17:06:25 +0000 Subject: [PATCH 25/56] add hover --- website/src/components/lightbox/index.js | 15 ++++++++++++++- website/src/components/lightbox/styles.module.css | 7 +++++++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index 1c748bbb04f..1bf6b9b53fc 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -1,4 +1,4 @@ -import React from 'react'; +import React, { useState } from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; @@ -10,6 +10,7 @@ function Lightbox({ title = undefined, width = undefined, }) { + const [isHovered, setIsHovered] = useState(false); // Set alignment class if alignment prop used let imageAlignment = '' @@ -19,6 +20,15 @@ function Lightbox({ imageAlignment = styles.rightAlignLightbox } + // Event handlers for mouse enter and leave + const handleMouseEnter = () => { + setIsHovered(true); + }; + + const handleMouseLeave = () => { + setIsHovered(false); + }; + return ( <> @@ -27,8 +37,11 @@ function Lightbox({ ${styles.docImage} ${collapsed ? styles.collapsed : ''} ${imageAlignment} + ${isHovered ? styles.hovered : ''} // Add class for hover state `} style={width && {maxWidth: width}} + onMouseEnter={handleMouseEnter} + onMouseLeave={handleMouseLeave} > diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index af0bb086cf5..29314a6b7aa 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -24,3 +24,10 @@ .rightAlignLightbox { margin: 10px 0 10px auto; } + +.hovered img { + /* Example: Scale up the image on hover */ + transform: scale(1.3); + transition: transform 0.5s ease; +} + From ad515b5f51c8c239c9bf5e9860fafad868de7f68 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 11 Jan 2024 13:33:21 +0000 Subject: [PATCH 26/56] update --- contributing/content-style-guide.md | 2 +- website/src/components/lightbox/index.js | 35 +++++++++++++++---- .../src/components/lightbox/styles.module.css | 4 +-- 3 files changed, 31 insertions(+), 10 deletions(-) diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md index 4ebbf83bf5f..357f8f0d751 100644 --- a/contributing/content-style-guide.md +++ b/contributing/content-style-guide.md @@ -56,7 +56,7 @@ docs.getdbt.com uses its own CSS, and Docusaurus supports its own specific Markd | Link - topic in different folder | `[Title](/folder/file-name) without file extension`* | | Link - section in topic in same folder | `[Title](/folder/file-name#section-name)`* | | Link - section in topic in different folder | `[Title](/folder/file-name#section-name)`* | -| Image | ``| +| Image | ``| *docs.getdbt.com uses specific folders when linking to topics or sections. A successful link syntax begins with one of the following folder paths: diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index 1bf6b9b53fc..c2951057b57 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -1,4 +1,4 @@ -import React, { useState } from 'react'; +import React, { useState, useRef, useEffect } from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; @@ -11,22 +11,41 @@ function Lightbox({ width = undefined, }) { const [isHovered, setIsHovered] = useState(false); + const [hoverStyle, setHoverStyle] = useState({}); + const imgRef = useRef(null); + + useEffect(() => { + if (imgRef.current && !width) { + const naturalWidth = imgRef.current.naturalWidth; + const naturalHeight = imgRef.current.naturalHeight; + + // Calculate the expanded size for images without a specified width + const expandedWidth = naturalWidth * 1.2; // Example: 20% increase + const expandedHeight = naturalHeight * 1.2; + + setHoverStyle({ + width: `${expandedWidth}px`, + height: `${expandedHeight}px`, + transition: 'width 0.5s ease, height 0.5s ease', + }); + } + }, [width]); // Set alignment class if alignment prop used - let imageAlignment = '' + let imageAlignment = ''; if(alignment === "left") { - imageAlignment = styles.leftAlignLightbox + imageAlignment = styles.leftAlignLightbox; } else if(alignment === "right") { - imageAlignment = styles.rightAlignLightbox + imageAlignment = styles.rightAlignLightbox; } - // Event handlers for mouse enter and leave const handleMouseEnter = () => { setIsHovered(true); }; const handleMouseLeave = () => { setIsHovered(false); + setHoverStyle({}); }; return ( @@ -37,7 +56,7 @@ function Lightbox({ ${styles.docImage} ${collapsed ? styles.collapsed : ''} ${imageAlignment} - ${isHovered ? styles.hovered : ''} // Add class for hover state + ${isHovered ? styles.hovered : ''} `} style={width && {maxWidth: width}} onMouseEnter={handleMouseEnter} @@ -46,15 +65,17 @@ function Lightbox({ {alt {title && ( - { title } + {title} )} diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index 29314a6b7aa..b101178825c 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -25,9 +25,9 @@ margin: 10px 0 10px auto; } -.hovered img { - /* Example: Scale up the image on hover */ +.hovered { transform: scale(1.3); transition: transform 0.5s ease; } + From e62b36be46d07069a8a9eee949f06e3b573cc0fc Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 11 Jan 2024 17:06:02 +0000 Subject: [PATCH 27/56] add widths and hover code --- ...022-11-22-move-spreadsheets-to-your-dwh.md | 10 +- .../blog/2022-11-30-dbt-project-evaluator.md | 4 +- .../blog/2023-01-17-grouping-data-tests.md | 4 +- ...01-ingestion-time-partitioning-bigquery.md | 2 +- website/blog/2023-03-23-audit-helper.md | 16 +-- website/blog/2023-04-17-dbt-squared.md | 6 +- ...ng-a-kimball-dimensional-model-with-dbt.md | 22 ++-- ...23-04-24-framework-refactor-alteryx-dbt.md | 10 +- ...odeling-ragged-time-varying-hierarchies.md | 2 +- .../2023-05-04-generating-dynamic-docs.md | 10 +- ...orical-user-segmentation-model-with-dbt.md | 6 +- ...023-07-03-data-vault-2-0-with-dbt-cloud.md | 4 +- website/blog/2023-07-17-GPT-and-dbt-test.md | 6 +- ...023-08-01-announcing-materialized-views.md | 2 +- .../2023-11-14-specify-prod-environment.md | 2 +- ...-12-11-semantic-layer-on-semantic-layer.md | 8 +- .../blog/2024-01-09-defer-in-development.md | 14 +-- .../clone-incremental-models.md | 6 +- .../dbt-unity-catalog-best-practices.md | 4 +- website/docs/docs/build/about-metricflow.md | 2 +- .../docs/docs/build/custom-target-names.md | 4 +- website/docs/docs/build/data-tests.md | 2 +- .../docs/docs/build/environment-variables.md | 20 ++-- website/docs/docs/build/exposures.md | 4 +- website/docs/docs/build/python-models.md | 4 +- website/docs/docs/build/semantic-models.md | 2 +- website/docs/docs/build/sources.md | 4 +- website/docs/docs/build/sql-models.md | 2 +- .../docs/cloud/about-cloud-develop-defer.md | 2 +- .../about-connections.md | 2 +- .../connect-apache-spark.md | 2 +- .../connect-databricks.md | 2 +- .../connect-redshift-postgresql-alloydb.md | 4 +- .../connect-snowflake.md | 6 +- .../connnect-bigquery.md | 4 +- .../dbt-cloud-ide/develop-in-the-cloud.md | 8 +- .../cloud/dbt-cloud-ide/ide-user-interface.md | 48 ++++---- .../docs/cloud/dbt-cloud-ide/lint-format.md | 20 ++-- .../docs/docs/cloud/git/authenticate-azure.md | 4 +- website/docs/docs/cloud/git/connect-github.md | 8 +- website/docs/docs/cloud/git/connect-gitlab.md | 14 +-- .../cloud/git/import-a-project-by-git-url.md | 12 +- website/docs/docs/cloud/git/setup-azure.md | 14 +-- .../docs/cloud/manage-access/audit-log.md | 6 +- .../cloud/manage-access/auth0-migration.md | 26 ++--- .../manage-access/cloud-seats-and-users.md | 16 +-- .../manage-access/enterprise-permissions.md | 4 +- .../docs/cloud/manage-access/invite-users.md | 12 +- .../manage-access/set-up-bigquery-oauth.md | 10 +- .../manage-access/set-up-databricks-oauth.md | 4 +- .../manage-access/set-up-snowflake-oauth.md | 4 +- .../set-up-sso-google-workspace.md | 10 +- .../manage-access/set-up-sso-saml-2.0.md | 66 ++++------- .../docs/cloud/manage-access/sso-overview.md | 2 +- .../docs/docs/cloud/secure/ip-restrictions.md | 4 +- .../docs/cloud/secure/postgres-privatelink.md | 4 +- .../docs/cloud/secure/redshift-privatelink.md | 14 +-- .../cloud/secure/snowflake-privatelink.md | 2 +- .../docs/docs/cloud/secure/vcs-privatelink.md | 10 +- .../cloud-build-and-view-your-docs.md | 8 +- .../docs/docs/collaborate/documentation.md | 8 +- .../collaborate/explore-multiple-projects.md | 4 +- .../collaborate/git/managed-repository.md | 2 +- .../docs/collaborate/git/merge-conflicts.md | 10 +- .../docs/docs/collaborate/git/pr-template.md | 2 +- .../docs/collaborate/model-performance.md | 6 +- .../collaborate/project-recommendations.md | 4 +- .../docs/docs/dbt-cloud-apis/discovery-api.md | 8 +- .../docs/dbt-cloud-apis/discovery-querying.md | 4 +- .../discovery-use-cases-and-examples.md | 10 +- .../docs/dbt-cloud-apis/service-tokens.md | 2 +- .../docs/docs/dbt-cloud-apis/user-tokens.md | 2 +- website/docs/docs/dbt-cloud-environments.md | 2 +- .../73-Jan-2024/partial-parsing.md | 2 +- .../74-Dec-2023/external-attributes.md | 2 +- .../release-notes/75-Nov-2023/repo-caching.md | 2 +- .../76-Oct-2023/native-retry-support-rn.md | 2 +- .../release-notes/76-Oct-2023/sl-ga.md | 2 +- .../77-Sept-2023/ci-updates-phase2-rn.md | 4 +- .../removing-prerelease-versions.md | 2 +- .../release-notes/79-July-2023/faster-run.md | 4 +- .../80-June-2023/lint-format-rn.md | 6 +- .../run-details-and-logs-improvements.md | 2 +- .../81-May-2023/run-history-endpoint.md | 2 +- .../81-May-2023/run-history-improvements.md | 2 +- .../86-Dec-2022/new-jobs-default-as-off.md | 2 +- .../92-July-2022/render-lineage-feature.md | 2 +- .../95-March-2022/ide-timeout-message.md | 2 +- .../95-March-2022/prep-and-waiting-time.md | 2 +- .../dbt-versions/upgrade-core-in-cloud.md | 6 +- website/docs/docs/deploy/artifacts.md | 8 +- website/docs/docs/deploy/ci-jobs.md | 14 +-- .../docs/deploy/continuous-integration.md | 6 +- .../docs/deploy/dashboard-status-tiles.md | 12 +- .../docs/docs/deploy/deploy-environments.md | 20 ++-- website/docs/docs/deploy/deploy-jobs.md | 6 +- .../docs/docs/deploy/deployment-overview.md | 6 +- website/docs/docs/deploy/deployment-tools.md | 6 +- website/docs/docs/deploy/job-commands.md | 2 +- website/docs/docs/deploy/job-notifications.md | 6 +- website/docs/docs/deploy/job-scheduler.md | 4 +- website/docs/docs/deploy/monitor-jobs.md | 6 +- website/docs/docs/deploy/retry-jobs.md | 2 +- website/docs/docs/deploy/run-visibility.md | 6 +- website/docs/docs/deploy/source-freshness.md | 6 +- .../using-the-dbt-ide.md | 8 +- .../use-dbt-semantic-layer/sl-architecture.md | 2 +- website/docs/faqs/API/rotate-token.md | 2 +- .../faqs/Accounts/change-users-license.md | 4 +- .../Accounts/cloud-upgrade-instructions.md | 6 +- website/docs/faqs/Accounts/delete-users.md | 6 +- .../Environments/custom-branch-settings.md | 2 +- website/docs/faqs/Git/git-migration.md | 2 +- website/docs/faqs/Git/gitignore.md | 8 +- website/docs/faqs/Project/delete-a-project.md | 4 +- .../docs/faqs/Troubleshooting/gitignore.md | 12 +- website/docs/guides/adapter-creation.md | 14 +-- website/docs/guides/bigquery-qs.md | 4 +- website/docs/guides/codespace-qs.md | 2 +- website/docs/guides/custom-cicd-pipelines.md | 6 +- website/docs/guides/databricks-qs.md | 32 +++--- website/docs/guides/dbt-python-snowpark.md | 106 +++++++++--------- website/docs/guides/dremio-lakehouse.md | 10 +- website/docs/guides/manual-install-qs.md | 10 +- website/docs/guides/microsoft-fabric-qs.md | 4 +- ...oductionize-your-dbt-databricks-project.md | 2 +- website/docs/guides/redshift-qs.md | 20 ++-- website/docs/guides/refactoring-legacy-sql.md | 4 +- website/docs/guides/set-up-ci.md | 6 +- website/docs/guides/snowflake-qs.md | 24 ++-- website/docs/guides/starburst-galaxy-qs.md | 10 +- website/docs/reference/commands/clone.md | 2 +- .../node-selection/graph-operators.md | 2 +- .../resource-configs/bigquery-configs.md | 2 +- .../resource-configs/persist_docs.md | 4 +- .../resource-configs/spark-configs.md | 2 +- .../resource-properties/description.md | 2 +- website/docs/terms/dag.md | 6 +- website/docs/terms/data-lineage.md | 4 +- website/snippets/_cloud-environments-info.md | 6 +- website/snippets/_new-sl-setup.md | 6 +- website/snippets/_sl-run-prod-job.md | 2 +- .../intro-build-models-atop-other-models.md | 2 +- website/src/components/lightbox/index.js | 61 +++++----- .../src/components/lightbox/styles.module.css | 10 +- 145 files changed, 568 insertions(+), 607 deletions(-) diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md index 93cf91efeed..09274b41a9b 100644 --- a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md +++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md @@ -70,9 +70,9 @@ An obvious choice if you have data to load into your warehouse would be your exi [Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either. - + - + A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated. @@ -100,7 +100,7 @@ The main benefit of connecting to Google Sheets instead of a static spreadsheet Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching. - + Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null. @@ -119,7 +119,7 @@ Beware of inconsistent data types though—if someone types text into a column t I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse. - + Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side. @@ -174,7 +174,7 @@ Each of the major data warehouses also has native integrations to import spreads Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system. - + ### BigQuery diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md index 3ea7a459c35..b936d4786cd 100644 --- a/website/blog/2022-11-30-dbt-project-evaluator.md +++ b/website/blog/2022-11-30-dbt-project-evaluator.md @@ -20,7 +20,7 @@ If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), yo Don’t believe me??? Here’s photographic proof. - + Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse. @@ -120,4 +120,4 @@ If something isn’t working quite right or you have ideas for future functional Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond. - + diff --git a/website/blog/2023-01-17-grouping-data-tests.md b/website/blog/2023-01-17-grouping-data-tests.md index 23fcce6d27e..3648837302b 100644 --- a/website/blog/2023-01-17-grouping-data-tests.md +++ b/website/blog/2023-01-17-grouping-data-tests.md @@ -43,11 +43,11 @@ So what do we discover when we validate our data by group? Testing for monotonicity, we find many poorly behaved turnstiles. Unlike the well-behaved dark blue line, other turnstiles seem to _decrement_ versus _increment_ with each rotation while still others cyclically increase and plummet to zero – perhaps due to maintenance events, replacements, or glitches in communication with the central server. - + Similarly, while no expected timestamp is missing from the data altogether, a more rigorous test of timestamps _by turnstile_ reveals between roughly 50-100 missing observations for any given period. - + _Check out this [GitHub gist](https://gist.github.com/emilyriederer/4dcc6a05ea53c82db175e15f698a1fb6) to replicate these views locally._ diff --git a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md index 99ce142d5ed..51a62006ee8 100644 --- a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md +++ b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md @@ -125,7 +125,7 @@ In both cases, the operation can be done on a single partition at a time so it r On a 192 GB partition here is how the different methods compare: - + Also, the `SELECT` statement consumed more than 10 hours of slot time while `MERGE` statement took days of slot time. diff --git a/website/blog/2023-03-23-audit-helper.md b/website/blog/2023-03-23-audit-helper.md index 8599ad5eb5d..106715c5e4f 100644 --- a/website/blog/2023-03-23-audit-helper.md +++ b/website/blog/2023-03-23-audit-helper.md @@ -19,7 +19,7 @@ It is common for analytics engineers (AE) and data analysts to have to refactor Not only is that approach time-consuming, but it is also prone to naive assumptions that values match based on aggregate measures (such as counts or sums). To provide a better, more accurate approach to auditing, dbt Labs has created the `audit_helper` package. `audit_helper` is a package for dbt whose main purpose is to audit data by comparing two tables (the original one versus a refactored model). It uses a simple and intuitive query structure that enables quickly comparing tables based on the column values, row amount, and even column types (for example, to make sure that a given column is numeric in both your table and the original one). Figure 1 graphically displays the workflow and where `audit_helper` is positioned in the refactoring process. - + Now that it is clear where the `audit_helper` package is positioned in the refactoring process, it is important to highlight the benefits of using audit_helper (and ultimately, of auditing refactored models). Among the benefits, we can mention: - **Quality assurance**: Assert that a refactored model is reaching the same output as the original model that is being refactored. @@ -57,12 +57,12 @@ According to the `audit_helper` package documentation, this macro comes in handy ### How it works When you run the dbt audit model, it will compare all columns, row by row. To count for the match, every column in a row from one source must exactly match a row from another source, as illustrated in the example in Figure 2 below: - + As shown in the example, the model is compared line by line, and in this case, all lines in both models are equivalent and the result should be 100%. Figure 3 below depicts a row in which two of the three columns are equal and only the last column of row 1 has divergent values. In this case, despite the fact that most of row 1 is identical, that row will not be counted towards the final result. In this example, only row 2 and row 3 are valid, yielding a 66.6% match in the total of analyzed rows. - + As previously stated, for the match to be valid, all column values of a model’s row must be equal to the other model. This is why we sometimes need to exclude columns from the comparison (such as date columns, which can have a time zone difference from the original model to the refactored — we will discuss tips like these below). @@ -103,12 +103,12 @@ Let’s understand the arguments used in the `compare_queries` macro: - `summarize` (optional): This argument allows you to switch between a summary or detailed (verbose) view of the compared data. This argument accepts true or false values (its default is set to be true). 3. Replace the sources from the example with your own - + As illustrated in Figure 4, using the `ref` statements allows you to easily refer to your development model, and using the full path makes it easy to refer to the original table (which will be useful when you are refactoring a SQL Server Stored Procedure or Alteryx Workflow that is already being materialized in the data warehouse). 4. Specify your comparison columns - + Delete the example columns and replace them with the columns of your models, exactly as they are written in each model. You should rename/alias the columns to match, as well as ensuring they are in the same order within the `select` clauses. @@ -129,7 +129,7 @@ Let’s understand the arguments used in the `compare_queries` macro: ``` The output will be the similar to the one shown in Figure 6 below: - +
The output is presented in table format, with each column explained below:
@@ -155,7 +155,7 @@ While we can surely rely on that overview to validate the final refactored model A really useful way to check out which specific columns are driving down the match percentage between tables is the `compare_column_values` macro that allows us to audit column values. This macro requires a column to be set, so it can be used as an anchor to compare entries between the refactored dbt model column and the legacy table column. Figure 7 illustrates how the `compare_column_value`s macro works. - + The macro’s output summarizes the status of column compatibility, breaking it down into different categories: perfect match, both are null, values do not match, value is null in A only, value is null in B only, missing from A and missing from B. This level of detailing makes it simpler for the AE or data analyst to figure out what can be causing incompatibility issues between the models. While refactoring a model, it is common that some keys used to join models are inconsistent, bringing up unwanted null values on the final model as a result, and that would cause the audit row query to fail, without giving much more detail. @@ -224,7 +224,7 @@ Also, we can see that the example code includes a table printing option enabled But unlike from the `compare_queries` macro, if you have kept the printing function enabled, you should expect a table to be printed in the command line when you run the model, as shown in Figure 8. Otherwise, it will be materialized on your data warehouse like this: - + The `compare_column_values` macro separates column auditing results in seven different labels: - **Perfect match**: count of rows (and relative percentage) where the column values compared between both tables are equal and not null; diff --git a/website/blog/2023-04-17-dbt-squared.md b/website/blog/2023-04-17-dbt-squared.md index 5cac73459a8..4050e52c690 100644 --- a/website/blog/2023-04-17-dbt-squared.md +++ b/website/blog/2023-04-17-dbt-squared.md @@ -50,11 +50,11 @@ We needed a way to make this architecture manageable when dozens of downstream t The second architectural decision was whether or not to create a single dbt project for all 50+ country teams, or to follow a multi-project approach in which each country would have its own separate dbt project in the shared repo. It was critical that each country team was able to move at different paces and have full control over their domains. This would avoid issues like model name collisions across countries and remove dependencies that would risk cascading errors between countries. Therefore, we opted for a one project per country approach. - + The resulting data flow from core to country teams now follows this pattern. The *Sources* database holds all of the raw data in the Redshift cluster and the *Integrated* database contains the curated and ready-for-consumption outputs from the core dbt project. These outputs are termed Source Data Products (SDPs). These SDPs are then leveraged by the core team to build Global Data Products—products tailored to answering business questions for global stakeholders. They are also filtered at the country-level and used as sources to the country-specific Data Products managed by the country teams. These, in turn, are hosted in the respective `affiliate_db_` database. Segregating at the database-level facilitates data governance and privacy management. - + ### People @@ -68,7 +68,7 @@ The success of this program relied on adopting DevOps practices from the start. Often overlooked, this third pillar of process can be the key to success when scaling a global platform. Simple things, such as accounting for time zone differences, can determine whether a message gets across the board. To facilitate the communication and coordination between Global and Country teams, all the teams follow the same sprint cycle, and we hold weekly scrum of scrums. We needed to set up extensive onboarding documentation, ensure newcomers had proper training and guidance, and create dedicated slack channels for announcements, incident reporting, and occasional random memes, helping build a community that stretches from Brazil to Malaysia. - + ## The solution: dbt Squared diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md index ab364749eff..eb357698d4b 100644 --- a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md +++ b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md @@ -14,7 +14,7 @@ is_featured: true Dimensional modeling is one of many data modeling techniques that are used by data practitioners to organize and present data for analytics. Other data modeling techniques include Data Vault (DV), Third Normal Form (3NF), and One Big Table (OBT) to name a few. - + While the relevance of dimensional modeling [has been debated by data practitioners](https://discourse.getdbt.com/t/is-kimball-dimensional-modeling-still-relevant-in-a-modern-data-warehouse/225/6), it is still one of the most widely adopted data modeling technique for analytics. @@ -39,7 +39,7 @@ Dimensional modeling is a technique introduced by Ralph Kimball in 1996 with his The goal of dimensional modeling is to take raw data and transform it into Fact and Dimension tables that represent the business. - + The benefits of dimensional modeling are: @@ -143,7 +143,7 @@ Examine the database source schema below, paying close attention to: - Keys - Relationships - + ### Step 8: Query the tables @@ -185,7 +185,7 @@ Now that you’ve set up the dbt project, database, and have taken a peek at the Identifying the business process is done in collaboration with the business user. The business user has context around the business objectives and business processes, and can provide you with that information. - + Upon speaking with the CEO of AdventureWorks, you learn the following information: @@ -222,11 +222,11 @@ There are two tables in the sales schema that catch our attention. These two tab - The `sales.salesorderheader` table contains information about the credit card used in the order, the shipping address, and the customer. Each record in this table represents an order header that contains one or more order details. - The `sales.salesorderdetail` table contains information about the product that was ordered, and the order quantity and unit price, which we can use to calculate the revenue. Each record in this table represents a single order detail. - + Let’s define a fact table called `fct_sales` which joins `sales.salesorderheader` and `sales.salesorderdetail` together. Each record in the fact table (also known as the [grain](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/grain/)) is an order detail. - + ### Dimension tables @@ -250,19 +250,19 @@ Based on the business questions that our business user would like answered, we c There are different ways we could create the dimension tables. We could use the existing relationships between the tables as depicted in the diagram below. - + This is known as a snowflake schema design, where the fact table is the centre of the snowflake, and there are many fractals branching off the centre of the snowflake. However, this results in many joins that need to be performed by the consumer of the dimensional model. Instead, we can denormalize the dimension tables by performing joins. - + This is known as a star schema and this approach reduces the amount of joins that need to be performed by the consumer of the dimensional model. Using the star schema approach, we can identify 6 dimensions as shown below that will help us answer the business questions: - + - `dim_product` : a dimension table that joins `product` , `productsubcategory`, `productcategory` - `dim_address` : a dimension table that joins `address` , `stateprovince`, `countryregion` @@ -617,7 +617,7 @@ Great work, you have successfully created your very first fact and dimension tab Let’s make it easier for consumers of our dimensional model to understand the relationships between tables by creating an [Entity Relationship Diagram (ERD)](https://www.visual-paradigm.com/guide/data-modeling/what-is-entity-relationship-diagram/). - + The ERD will enable consumers of our dimensional model to quickly identify the keys and relationship type (one-to-one, one-to-many) that need to be used to join tables. @@ -694,7 +694,7 @@ Using `dbt_utils.star()`, we select all columns except the surrogate key columns We can then build the OBT by running `dbt run`. Your dbt DAG should now look like this: - + Congratulations, you have reached the end of this tutorial. If you want to learn more, please see the learning resources below on dimensional modeling. diff --git a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md index 46cfcb58cdd..2c6a9d87591 100644 --- a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md +++ b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md @@ -17,7 +17,7 @@ Alteryx is a visual data transformation platform with a user-friendly interface Transforming data to follow business rules can be a complex task, especially with the increasing amount of data collected by companies. To reduce such complexity, data transformation solutions designed as drag-and-drop tools can be seen as more intuitive, since analysts can visualize the steps taken to transform data. One example of a popular drag-and-drop transformation tool is Alteryx which allows business analysts to transform data by dragging and dropping operators in a canvas. The graphic interface of Alteryx Designer is presented in **Figure 1**. - + Nonetheless, as data workflows become more complex, Alteryx lacks the modularity, documentation, and version control capabilities that these flows require. In this sense, dbt may be a more suitable solution to building resilient and modular data pipelines due to its focus on data modeling. @@ -62,7 +62,7 @@ This blog post reports a consulting project for a major client at Indicium Tech When the client hired Indicium, they had dozens of Alteryx workflows built and running daily solely for the marketing team, which was the focus of the project. For the marketing team, the Alteryx workflows had to be executed in the correct order since they were interdependent, which means one Alteryx workflow used the outcome of the previous one, and so on. The main Alteryx workflows run daily by the marketing team took about 6 hours to run. Another important aspect to consider was that if a model had not finished running when the next one downstream began to run, the data would be incomplete, requiring the workflow to be run again. The execution of all models was usually scheduled to run overnight and by early morning, so the data would be up to date the next day. But if there was an error the night before, the data would be incorrect or out of date. **Figure 3** exemplifies the scheduler. - + Data lineage was a point that added a lot of extra labor because it was difficult to identify which models were dependent on others with so many Alteryx workflows built. When the number of workflows increased, it required a long time to create a view of that lineage in another software. So, if a column's name changed in a model due to a change in the model's source, the marketing analysts would have to map which downstream models were impacted by such change to make the necessary adjustments. Because model lineage was mapped manually, it was a challenge to keep it up to date. @@ -89,7 +89,7 @@ The first step is to validate all data sources and create one com It is essential to click on each data source (the green book icons on the leftmost side of **Figure 5**) and examine whether any transformations have been done inside that data source query. It is very common for a source icon to contain more than one data source or filter, which is why this step is important. The next step is to follow the workflow and transcribe the transformations into SQL queries in the dbt models to replicate the same data transformations as in the Alteryx workflow. - + For this step, we identified which operators were used in the data source (for example, joining data, order columns, group by, etc). Usually the Alteryx operators are pretty self-explanatory and all the information needed for understanding appears on the left side of the menu. We also checked the documentation to understand how each Alteryx operator works behind the scenes. @@ -102,7 +102,7 @@ Auditing large models, with sometimes dozens of columns and millions of rows, ca In this project, we used [the `audit_helper` package](https://github.com/dbt-labs/dbt-audit-helper), because it provides more robust auditing macros and offers more automation possibilities for our use case. To that end, we needed to have both the legacy Alteryx workflow output table and the refactored dbt model materialized in the project’s data warehouse. Then we used the macros available in `audit_helper` to compare query results, data types, column values, row numbers and many more things that are available within the package. For an in-depth explanation and tutorial on how to use the `audit_helper` package, check out [this blog post](https://docs.getdbt.com/blog/audit-helper-for-migration). **Figure 6** graphically illustrates the validation logic behind audit_helper. - + #### Step 4: Duplicate reports and connect them to the dbt refactored models @@ -120,7 +120,7 @@ The conversion proved to be of great value to the client due to three main aspec - Improved workflow visibility: dbt’s support for documentation and testing, associated with dbt Cloud, allows for great visibility of the workflow’s lineage execution, accelerating errors and data inconsistencies identification and troubleshooting. More than once, our team was able to identify the impact of one column’s logic alteration in downstream models much earlier than these Alteryx models. - Workflow simplification: dbt’s modularized approach of data modeling, aside from accelerating total run time of the data workflow, simplified the construction of new tables, based on the already existing modules, and improved code readability. - + As we can see, refactoring Alteryx to dbt was an important step in the direction of data availability, and allowed for much more agile processes for the client’s data team. With less time dedicated to manually executing sequential Alteryx workflows that took hours to complete, and searching for errors in each individual file, the analysts could focus on what they do best: **getting insights from the data and generating value from them**. diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md index f719bdb40cb..2b00787cc07 100644 --- a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md +++ b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md @@ -22,7 +22,7 @@ To help visualize this data, we're going to pretend we are a company that manufa Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like: - + This hierarchy is *ragged* because different paths through the hierarchy terminate at different depths. It is *time-varying* because specific components can be added and removed. diff --git a/website/blog/2023-05-04-generating-dynamic-docs.md b/website/blog/2023-05-04-generating-dynamic-docs.md index 1e704178b0a..b6e8d929e72 100644 --- a/website/blog/2023-05-04-generating-dynamic-docs.md +++ b/website/blog/2023-05-04-generating-dynamic-docs.md @@ -35,7 +35,7 @@ This results in a lot of the same columns (e.g. `account_id`) existing in differ In fact, I found a better way using some CLI commands, the dbt Codegen package and docs blocks. I also made the following meme in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) channel #memes-and-off-topic-chatter to encapsulate this method: - + ## What pain is being solved? @@ -279,7 +279,7 @@ To confirm the formatting works, run the following command to get dbt Docs up an ``` $ dbt docs && dbt docs serve ``` - + Here, you can confirm that the column descriptions using the doc blocks are working as intended. @@ -326,7 +326,7 @@ user_id ``` Now, open your code editor, and replace `(.*)` with `{% docs column__activity_based_interest__$1 %}\n\n{% enddocs %}\n`, which will result in the following in your markdown file: - + Now you can add documentation to each of your columns. @@ -334,7 +334,7 @@ Now you can add documentation to each of your columns. You can programmatically identify all columns, and have them point towards the newly-created documentation. In your code editor, replace `\s{6}- name: (.*)\n description: ""` with ` - name: $1\n description: "{{ doc('column__activity_based_interest__$1') }}`: - + ⚠️ Some of your columns may already be available in existing docs blocks. In this example, the following replacements are done: - `{{ doc('column__activity_based_interest__user_id') }}` → `{{ doc("column_user_id") }}` @@ -343,7 +343,7 @@ You can programmatically identify all columns, and have them point towards the n ## Check that everything works Run `dbt docs generate`. If there are syntax errors, this will be found out at this stage. If successful, we can run `dbt docs serve` to perform a smoke test and ensure everything looks right: - + ## Additional considerations diff --git a/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md b/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md index a8b0e1f9f8c..ac6aee5176c 100644 --- a/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md +++ b/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md @@ -21,7 +21,7 @@ Take for example a Customer Experience (CX) team that uses Salesforce as a CRM. An improvement to this would be to prioritize the tickets based on the customer segment, answering our most valuable customers first. An Analytics Engineer can build a segmentation to identify the power users (for example with an RFM approach) and store it in the data warehouse. The Data Engineering team can then export that user attribute to the CRM, allowing the customer experience team to build rules on top of it. - + ## Problems @@ -58,7 +58,7 @@ The goal of RFM analysis is to segment customers into groups based on how recent We are going to use just the Recency and Frequency matrix, and use the Monetary value as an accessory attribute. This is a common approach in companies where the Frequency and the Monetary Value are highly correlated. - + ### RFM model for current segment @@ -390,7 +390,7 @@ FROM current_segments With the new approach, our dependency graph would look like this: - + - For analysts that want to see how the segments changed over time, they can query the historical model. There is also an option to build an aggregated model before loading it in a Business Intelligence tool. - For ML model training, data scientists and machine learning practitioners can import this model into their notebooks or their feature store, instead of rebuilding the attributes from scratch. diff --git a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md index 6b1012a5320..98586b2552c 100644 --- a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md +++ b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md @@ -26,7 +26,7 @@ A new development in the city? No problem! Just hook up the new pipes to the res Data Vault is the dam and reservoir: it is the well-engineered data model to structure an organization’s data from source systems for use by downstream data projects – rather than each team collecting data straight from the source. The Data Vault data model is designed using a few well-applied principles, and in practice, pools source data so it is available for use by all downstream consumers. This promotes a scalable data warehouse through reusability and modularity. - + ## Data Vault components @@ -139,7 +139,7 @@ Within the [dq_tools](https://hub.getdbt.com/infinitelambda/dq_tools/latest/) _p To help you get started, [we have created a template GitHub project](https://github.com/infinitelambda/dbt-data-vault-template) you can utilize to understand the basic principles of building Data Vault with dbt Cloud using one of the abovementioned packages. But if you need help building your Data Vault, get in touch. - + ### Entity Relation Diagrams (ERDs) and dbt diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md index 84f756919a5..12e380eb220 100644 --- a/website/blog/2023-07-17-GPT-and-dbt-test.md +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -55,7 +55,7 @@ We all know how ChatGPT can digest very complex prompts, but as this is a tool f Opening ChatGPT with GPT4, my first prompt is usually along these lines: - + And the output of this simple prompt is nothing short of amazing: @@ -118,7 +118,7 @@ Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context o A prompt for it would look something like: - + ## Specify details on generic tests in your prompts @@ -133,7 +133,7 @@ Accepted_values and relationships are slightly trickier but the model can be adj One way of doing this is with a prompt like this: - + Which results in the following output: diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index eb9716e73a5..6534e1d0b56 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -103,7 +103,7 @@ When we talk about using materialized views in development, the question to thin Outside of the scheduling part, development will be pretty standard. Your pipeline is likely going to look something like this: - + This is assuming you have a near real time pipeline where you are pulling from a streaming data source like a Kafka Topic via an ingestion tool of your choice like Snowpipe for Streaming into your data platform. After your data is in the data platform, you will: diff --git a/website/blog/2023-11-14-specify-prod-environment.md b/website/blog/2023-11-14-specify-prod-environment.md index c6ad2b31027..ecb6ddc8b25 100644 --- a/website/blog/2023-11-14-specify-prod-environment.md +++ b/website/blog/2023-11-14-specify-prod-environment.md @@ -56,7 +56,7 @@ By using the environment as the arbiter of state, any time a change is made to y ## The easiest way to break apart your jobs {#how} - + For most projects, changing from a job-centric to environment-centric approach to metadata is straightforward and immediately pays dividends as described above. Assuming that your Staging/CI and Production jobs are currently intermingled, you can extricate them as follows: diff --git a/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md index 44499c51ec5..c5e541358b9 100644 --- a/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md +++ b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md @@ -31,7 +31,7 @@ There are [plenty of other great resources](https://docs.getdbt.com/docs/build/p - + @@ -42,7 +42,7 @@ Let’s walk through the DAG from left to right: First, we have raw tables from What [is a semantic model](https://docs.getdbt.com/docs/build/semantic-models)? Put simply, semantic models contain the components we need to build metrics. Semantic models are YAML files that live in your dbt project. They contain metadata about your dbt models in a format that MetricFlow, the query builder that powers the semantic layer, can understand. The DAG below in [dbt Explorer](https://docs.getdbt.com/docs/collaborate/explore-projects) shows the metrics we’ve built off of `semantic_layer_queries`. - + Let’s dig into semantic models and metrics a bit more, and explain some of the data modeling decisions we made. First, we needed to decide what model to use as a base for our semantic model. We decide to use`fct_semantic_layer`queries as our base model because defining a semantic model on top of a normalized fact table gives us maximum flexibility to join to other tables. This increased the number of dimensions available, which means we can answer more questions. @@ -79,13 +79,13 @@ To query to Semantic Layer you have two paths: you can query metrics directly th The leg work of building our pipeline and defining metrics is all done, which makes last-mile consumption much easier. First, we set up a launch dashboard in Hex as the source of truth for semantic layer product metrics. This tool is used by cross-functional partners like marketing, sales, and the executive team to easily check product and usage metrics like total semantic layer queries, or weekly active semantic layer users. To set up our Hex connection, we simply enter a few details from our dbt Cloud environment and then we can work with metrics directly in Hex notebooks. We can use the JDBC interface, or use Hex’s GUI metric builder to build reports. We run all our WBRs off this dashboard, which allows us to spot trends in consumption and react quickly to changes in our business. - + On the finance and operations side, product usage data is crucial to making informed pricing decisions. All our pricing models are created in spreadsheets, so we leverage the Google Sheets integration to give those teams access to consistent data sets without the need to download CSVs from the Hex dashboard. This lets the Pricing team add dimensional slices, like tier and company size, to the data in a self-serve manner without having to request data team resources to generate those insights. This allows our finance team to iteratively build financial models and be more self-sufficient in pulling data, instead of relying on data team resources. - + As a former data scientist and data engineer, I personally think this is a huge improvement over the approach I would have used without the semantic layer. My old approach would have been to materialize One Big Table with all the numeric and categorical columns I needed for analysis. Then write a ton of SQL in Hex or various notebooks to create reports for stakeholders. Inevitably I’m signing up for more development cycles to update the pipeline whenever a new dimension needs to be added or the data needs to be aggregated in a slightly different way. From a data team management perspective, using a central semantic layer saves data analysts cycles since users can more easily self-serve. At every company I’ve ever worked at, data analysts are always in high demand, with more requests than they can reasonably accomplish. This means any time a stakeholder can self-serve their data without pulling us in is a huge win. diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md index 634fd1100c9..a84f6cd3083 100644 --- a/website/blog/2024-01-09-defer-in-development.md +++ b/website/blog/2024-01-09-defer-in-development.md @@ -58,15 +58,15 @@ Let’s think back to the hypothetical above — what if we made use of the Let’s take a look at a simplified example — let’s say your project looks like this in production: - + And you’re tasked with making changes to `model_f`. Without defer, you would need to make sure to at minimum execute a `dbt run -s +model_f` to ensure all the upstream dependencies of `model_f` are present in your development schema so that you can start to run `model_f`.* You just spent a whole bunch of time and money duplicating your models, and now your warehouse looks like this: - + With defer, we should not build anything other than the models that have changed, and are now different from their production counterparts! Let’s tell dbt to use production metadata to resolve our refs, and only build the model I have changed — that command would be `dbt run -s model_f --defer` .** - + This results in a *much slimmer build* — we read data in directly from the production version of `model_b` and `model_c`, and don’t have to worry about building anything other than what we selected! @@ -80,7 +80,7 @@ dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and th In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! - + The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : @@ -100,13 +100,13 @@ One of the major gotchas in the defer workflow is that when you’re in defer mo Let’s take a look at that example above again, and pretend that some time before we went to make this edit, we did some work on `model_c`, and we have a local copy of `model_c` hanging out in our development schema: - + When you run `dbt run -s model_f --defer` , dbt will detect the development copy of `model_c` and say “Hey, y’know, I bet Dave is working on that model too, and he probably wants to make sure his changes to `model_c` work together with his changes to `model_f` . Because I am a kind and benevolent data transformation tool, i’ll make sure his `{{ ref('model_c') }]` function compiles to his development changes!” Thanks dbt! As a result, we’ll effectively see this behavior when we run our command: - + Where our code would compile from @@ -155,6 +155,6 @@ While defer is a faster and cheaper option for most folks in most situations, de ### Call me Willem Defer - + Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/docs/best-practices/clone-incremental-models.md b/website/docs/best-practices/clone-incremental-models.md index 11075b92161..99982042de1 100644 --- a/website/docs/best-practices/clone-incremental-models.md +++ b/website/docs/best-practices/clone-incremental-models.md @@ -17,11 +17,11 @@ Imagine you've created a [Slim CI job](/docs/deploy/continuous-integration) in d - Run the command `dbt build --select state:modified+` to run and test all of the models you've modified and their downstream dependencies. - Trigger whenever a developer on your team opens a PR against the main branch. - + Now imagine your dbt project looks something like this in the DAG: - + When you open a pull request (PR) that modifies `dim_wizards`, your CI job will kickoff and build _only the modified models and their downstream dependencies_ (in this case, `dim_wizards` and `fct_orders`) into a temporary schema that's unique to your PR. @@ -49,7 +49,7 @@ You'll have two commands for your dbt Cloud CI check to execute: Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode. - + Your CI jobs will run faster, and you're more accurately mimicking the behavior of what will happen once the PR has been merged into main. diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index a55e1d121af..5f230263cf8 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -21,11 +21,11 @@ If you use multiple Databricks workspaces to isolate development from production To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. - + When you create environments in dbt Cloud, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. - + ## Access Control diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md index ea2efcabf06..e229df2dfc8 100644 --- a/website/docs/docs/build/about-metricflow.md +++ b/website/docs/docs/build/about-metricflow.md @@ -55,7 +55,7 @@ For a semantic model, there are three main pieces of metadata: * [Dimensions](/docs/build/dimensions) — These are the ways you want to group or slice/dice your metrics. * [Measures](/docs/build/measures) — The aggregation functions that give you a numeric result and can be used to create your metrics. - + ### Metrics diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index ac7036de572..4786641678d 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -21,9 +21,9 @@ where created_at > date_trunc('month', current_date) To set a custom target name for a job in dbt Cloud, configure the **Target Name** field for your job in the Job Settings page. - + ## dbt Cloud IDE When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. - + diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index d981d7e272d..7c12e5d7059 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -245,7 +245,7 @@ Normally, a data test query will calculate failures as part of its execution. If This workflow allows you to query and examine failing records much more quickly in development: - + Note that, if you elect to store test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).) diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 3f2aebd0036..14076352ac1 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -17,7 +17,7 @@ Environment variables in dbt Cloud must be prefixed with either `DBT_` or `DBT_E Environment variable values can be set in multiple places within dbt Cloud. As a result, dbt Cloud will interpret environment variables according to the following order of precedence (lowest to highest): - + There are four levels of environment variables: 1. the optional default argument supplied to the `env_var` Jinja function in code @@ -30,7 +30,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -38,7 +38,7 @@ You'll notice there is a `Project Default` column. This is a great place to set To the right of the `Project Default` column are all your environments. Values set at the environment level take priority over the project level default value. This is where you can tell dbt Cloud to interpret an environment value differently in your Staging vs. Production environment, as example. - + @@ -48,12 +48,12 @@ You may have multiple jobs that run in the same environment, and you'd like the When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. - + Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. - + **Overriding environment variables at the personal level** @@ -61,11 +61,11 @@ Every job runs in a specific, deployment environment, and by default, a job will You can also set a personal value override for an environment variable when you develop in the dbt integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." - + To supply an override, developers can edit and specify a different value to use. These values will be respected in the IDE both for the Results and Compiled SQL tabs. - + :::info Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt Cloud does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". @@ -77,7 +77,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -86,7 +86,7 @@ There are some known issues with partial parsing of a project and changing envir While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in the UI, you can prefix the key with `DBT_ENV_SECRET_`. This functionality is supported from `dbt v1.0` and on. - + **Note**: An environment variable can be used to store a [git token for repo cloning](/docs/build/environment-variables#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. @@ -131,7 +131,7 @@ Currently, it's not possible to dynamically set environment variables across mod **Note** — You can also use this method with Databricks SQL Warehouse. - + :::info Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index 65c0792e0a0..a26ac10bd36 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -118,8 +118,8 @@ dbt test -s +exposure:weekly_jaffle_report When we generate our documentation site, you'll see the exposure appear: - - + + ## Related docs diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 3fe194a4cb7..b24d3129f0c 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ Use the `cluster` submission method with dedicated Dataproc clusters you or your - Enable Dataproc APIs for your project + region - If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot) - + The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc) or configure them on specific Python models: - `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. @@ -706,7 +706,7 @@ Google recommends installing Python packages on Dataproc clusters via initializa You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. - + **Docs:** - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index 5c6883cdcee..d19354d9199 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -18,7 +18,7 @@ Semantic models are the foundation for data definition in MetricFlow, which powe - Configure semantic models in a YAML file within your dbt project directory. - Organize them under a `metrics:` folder or within project sources as needed. - + Semantic models have 6 components and this page explains the definitions with some examples: diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index 466bcedc688..e4fb10ac725 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -84,7 +84,7 @@ left join raw.jaffle_shop.customers using (customer_id) Using the `{{ source () }}` function also creates a dependency between the model and the source table. - + ### Testing and documenting sources You can also: @@ -189,7 +189,7 @@ from raw.jaffle_shop.orders The results of this query are used to determine whether the source is fresh or not: - + ### Filter diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index a0dd174278b..d33e4798974 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -254,7 +254,7 @@ create view analytics.customers as ( dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). - + * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. diff --git a/website/docs/docs/cloud/about-cloud-develop-defer.md b/website/docs/docs/cloud/about-cloud-develop-defer.md index 37bfaacfd0c..f6478c83970 100644 --- a/website/docs/docs/cloud/about-cloud-develop-defer.md +++ b/website/docs/docs/cloud/about-cloud-develop-defer.md @@ -36,7 +36,7 @@ To enable defer in the dbt Cloud IDE, toggle the **Defer to production** button For example, if you were to start developing on a new branch with [nothing in your development schema](/reference/node-selection/defer#usage), edit a single model, and run `dbt build -s state:modified` — only the edited model would run. Any `{{ ref() }}` functions will point to the production location of the referenced models. - + ### Defer in dbt Cloud CLI diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 93bbf83584f..bc4a515112d 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -22,7 +22,7 @@ import MSCallout from '/snippets/_microsoft-adapters-soon.md'; You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. - + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) diff --git a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md index 0186d821a54..eecf0a8e229 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md @@ -36,4 +36,4 @@ HTTP and Thrift connection methods: | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md index 032246ad16a..ebf6be63bd1 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md @@ -37,4 +37,4 @@ To set up the Databricks connection, supply the following fields: | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md index 06b9dd62f1a..2109e281e6a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md @@ -15,7 +15,7 @@ The following fields are required when creating a Postgres, Redshift, or AlloyDB **Note**: When you set up a Redshift or Postgres connection in dbt Cloud, SSL-related parameters aren't available as inputs. - + For dbt Cloud users, please log in using the default Database username and password. Note this is because [`IAM` authentication](https://docs.aws.amazon.com/redshift/latest/mgmt/generating-user-credentials.html) is not compatible with dbt Cloud. @@ -25,7 +25,7 @@ To connect to a Postgres, Redshift, or AlloyDB instance via an SSH tunnel, selec Once the connection is saved, a public key will be generated and displayed for the Connection. You can copy this public key to the bastion server to authorize dbt Cloud to connect to your database via the bastion server. - + #### About the Bastion server in AWS diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index c265529fb49..05f0c1dc07a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -27,7 +27,7 @@ username (specifically, the `login_name`) and the corresponding user's Snowflake to authenticate dbt Cloud to run queries against Snowflake on behalf of a Snowflake user. **Note**: The schema field in the **Developer Credentials** section is a required field. - + ### Key Pair @@ -59,7 +59,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private -----END ENCRYPTED PRIVATE KEY----- ``` - + ### Snowflake OAuth @@ -68,7 +68,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private The OAuth auth method permits dbt Cloud to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt Cloud. For more information on configuring a Snowflake OAuth connection in dbt Cloud, please see [the docs on setting up Snowflake OAuth](/docs/cloud/manage-access/set-up-snowflake-oauth). - + ## Configuration diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 7ea6e380000..2e637b7450a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -32,7 +32,7 @@ In addition to these fields, there are two other optional fields that can be con - + ### BigQuery OAuth **Available in:** Development environments, Enterprise plans only @@ -43,7 +43,7 @@ more information on the initial configuration of a BigQuery OAuth connection in [the docs on setting up BigQuery OAuth](/docs/cloud/manage-access/set-up-bigquery-oauth). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your personal Profile in dbt Cloud, like so: - + ## Configuration diff --git a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md index 57146ec513a..3a9f8d9e872 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md @@ -22,7 +22,7 @@ These [features](#dbt-cloud-ide-features) create a powerful editing environment - + @@ -83,7 +83,7 @@ There are three start-up states when using or launching the Cloud IDE: The Cloud IDE needs explicit action to save your changes. There are three ways your work is stored: - **Unsaved, local code —** The browser stores your code only in its local storage. In this state, you might need to commit any unsaved changes in order to switch branches or browsers. If you have saved and committed changes, you can access the "Change branch" option even if there are unsaved changes. But if you attempt to switch branches without saving changes, a warning message will appear, notifying you that you will lose any unsaved changes. - + - **Saved but uncommitted code —** When you save a file, the data gets stored in durable, long-term storage, but isn't synced back to git. To switch branches using the **Change branch** option, you must "Commit and sync" or "Revert" changes. Changing branches isn't available for saved-but-uncommitted code. This is to ensure your uncommitted changes don't get lost. - **Committed code —** This is stored in the branch with your git provider and you can check out other (remote) branches. @@ -108,7 +108,7 @@ Set up your developer credentials: 4. Enter the details under **Development Credentials**. 5. Click **Save.** - + 6. Access the Cloud IDE by clicking **Develop** at the top of the page. @@ -124,7 +124,7 @@ If a model or test fails, dbt Cloud makes it easy for you to view and download t Use dbt's [rich model selection syntax](/reference/node-selection/syntax) to [run dbt commands](/reference/dbt-commands) directly within dbt Cloud. - + ## Build and view your project's docs diff --git a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md index 2038d4ad64c..c99b4fdc0c3 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md @@ -10,13 +10,13 @@ The [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) is a tool fo This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the IDE landscape with ease. - + ## Basic layout The IDE streamlines your workflow, and features a popular user interface layout with files and folders on the left, editor on the right, and command and console information at the bottom. - + 1. **Git repository link —** Clicking the Git repository link, located on the upper left of the IDE, takes you to your repository on the same active branch. * **Note:** This feature is only available for GitHub or GitLab repositories on multi-tenant dbt Cloud accounts. @@ -36,7 +36,7 @@ The IDE streamlines your workflow, and features a popular user interface layout * Added (A) — The IDE detects added files * Deleted (D) — The IDE detects deleted files. - + 5. **Command bar —** The Command bar, located in the lower left of the IDE, is used to invoke [dbt commands](/reference/dbt-commands). When a command is invoked, the associated logs are shown in the Invocation History Drawer. @@ -49,7 +49,7 @@ The IDE streamlines your workflow, and features a popular user interface layout The IDE features some delightful tools and layouts to make it easier for you to write dbt code and collaborate with teammates. - + 1. **File Editor —** The File Editor is where users edit code. Tabs break out the region for each opened file, and unsaved files are marked with a blue dot icon in the tab view. @@ -61,29 +61,29 @@ The IDE features some delightful tools and layouts to make it easier for you to - **Version Control Options menu —** Below the Git Actions button, the **Changes** section, which lists all file changes since the last commit. You can click on a change to open the Git Diff View to see the inline changes. You can also right-click any file and use the file-specific options in the Version Control Options menu. - + ## Additional editing features - **Minimap —** A Minimap (code outline) gives you a high-level overview of your source code, which is useful for quick navigation and code understanding. A file's minimap is displayed on the upper-right side of the editor. To quickly jump to different sections of your file, click the shaded area. - + - **dbt Editor Command Palette —** The dbt Editor Command Palette displays text editing actions and their associated keyboard shortcuts. This can be accessed by pressing `F1` or right-clicking in the text editing area and selecting Command Palette. - + - **Git Diff View —** Clicking on a file in the **Changes** section of the **Version Control Menu** will open the changed file with Git Diff view. The editor will show the previous version on the left and the in-line changes made on the right. - + - **Markdown Preview console tab —** The Markdown Preview console tab shows a preview of your .md file's markdown code in your repository and updates it automatically as you edit your code. - + - **CSV Preview console tab —** The CSV Preview console tab displays the data from your CSV file in a table, which updates automatically as you edit the file in your seed directory. - + ## Console section The console section, located below the File editor, includes various console tabs and buttons to help you with tasks such as previewing, compiling, building, and viewing the . Refer to the following sub-bullets for more details on the console tabs and buttons. - + 1. **Preview button —** When you click on the Preview button, it runs the SQL in the active file editor regardless of whether you have saved it or not and sends the results to the **Results** console tab. You can preview a selected portion of saved or unsaved code by highlighting it and then clicking the **Preview** button. @@ -107,17 +107,17 @@ Starting from dbt v1.6 or higher, when you save changes to a model, you can comp 3. **Format button —** The editor has a **Format** button that can reformat the contents of your files. For SQL files, it uses either `sqlfmt` or `sqlfluff`, and for Python files, it uses `black`. 5. **Results tab —** The Results console tab displays the most recent Preview results in tabular format. - + 6. **Compiled Code tab —** The Compile button triggers a compile invocation that generates compiled code, which is displayed in the Compiled Code tab. - + 7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). - Double-click a node in the DAG to open that file in a new tab - Expand or shrink the DAG using node selection syntax. - Note, the `--exclude` flag isn't supported. - + ## Invocation history @@ -128,7 +128,7 @@ You can open the drawer in multiple ways: - Typing a dbt command and pressing enter - Or pressing Control-backtick (or Ctrl + `) - + 1. **Invocation History list —** The left-hand panel of the Invocation History Drawer displays a list of previous invocations in the IDE, including the command, branch name, command status, and elapsed time. @@ -138,7 +138,7 @@ You can open the drawer in multiple ways: 4. **Command Control button —** Use the Command Control button, located on the right side, to control your invocation and cancel or rerun a selected run. - + 5. **Node Summary tab —** Clicking on the Results Status Tabs will filter the Node Status List based on their corresponding status. The available statuses are Pass (successful invocation of a node), Warn (test executed with a warning), Error (database error or test failure), Skip (nodes not run due to upstream error), and Queued (nodes that have not executed yet). @@ -150,25 +150,25 @@ You can open the drawer in multiple ways: ## Modals and Menus Use menus and modals to interact with IDE and access useful options to help your development workflow. -- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. +- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. - **File Search —** You can easily search for and navigate between files using the File Navigation menu, which can be accessed by pressing Command-O or Control-O or clicking on the 🔍 icon in the File Explorer. - + - **Global Command Palette—** The Global Command Palette provides helpful shortcuts to interact with the IDE, such as git actions, specialized dbt commands, and compile, and preview actions, among others. To open the menu, use Command-P or Control-P. - + - **IDE Status modal —** The IDE Status modal shows the current error message and debug logs for the server. This also contains an option to restart the IDE. Open this by clicking on the IDE Status button. - + - **Commit Changes modal —** The Commit Changes modal is accessible via the Git Actions button to commit all changes or via the Version Control Options menu to commit individual changes. Once you enter a commit message, you can use the modal to commit and sync the selected changes. - + - **Change Branch modal —** The Change Branch modal allows users to switch git branches in the IDE. It can be accessed through the `Change Branch` link or the Git Actions button in the Version Control menu. - + - **Revert Uncommitted Changes modal —** The Revert Uncommitted Changes modal is how users revert changes in the IDE. This is accessible via the `Revert File` option above the Version Control Options menu, or via the Git Actions button when there are saved, uncommitted changes in the IDE. - + - **IDE Options menu —** The IDE Options menu can be accessed by clicking on the three-dot menu located at the bottom right corner of the IDE. This menu contains global options such as: @@ -177,4 +177,4 @@ Use menus and modals to interact with IDE and access useful options to help your * Fully recloning your repository to refresh your git state and view status details * Viewing status details, including the IDE Status modal. - + diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index f6f2265a922..e1ff64faf2b 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -26,15 +26,15 @@ By default, the IDE uses sqlfmt rules to format your code, making it convenient - + - + - + - + - + @@ -63,7 +63,7 @@ Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to the [FA - **Fix** button — Automatically fixes linting errors in the **File editor**. When fixing is complete, you'll see a message confirming the outcome. - Use the **Code Quality** tab to view and debug any code errors. - + ### Customize linting @@ -130,7 +130,7 @@ group_by_and_order_by_style = implicit For more info on styling best practices, refer to [How we style our SQL](/best-practices/how-we-style/2-how-we-style-our-sql). ::: - + ## Format @@ -158,7 +158,7 @@ To enable sqlfmt: 6. Once you've selected the **sqlfmt** radio button, go to the console section (located below the **File editor**) to select the **Format** button. 7. The **Format** button auto-formats your code in the **File editor**. Once you've auto-formatted, you'll see a message confirming the outcome. - + ### Format YAML, Markdown, JSON @@ -169,7 +169,7 @@ To format your YAML, Markdown, or JSON code, dbt Cloud integrates with [Prettier 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. Use the **Code Quality** tab to view code errors. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + You can add a configuration file to customize formatting rules for YAML, Markdown, or JSON files using Prettier. The IDE looks for the configuration file based on an order of precedence. For example, it first checks for a "prettier" key in your `package.json` file. @@ -185,7 +185,7 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + ## FAQs diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index 42028bf993b..bbb2cff8b29 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -16,11 +16,11 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: 1. Click the gear icon at the top right and select **Profile settings**. 2. Click **Linked Accounts**. 3. Next to Azure DevOps, click **Link**. - + 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. - + You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index ff0f2fff18f..715f23912e5 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -30,13 +30,13 @@ To connect your dbt Cloud account to your GitHub account: 2. Select **Linked Accounts** from the left menu. - + 3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. 4. Select the GitHub organization and repositories dbt Cloud should access. - + 5. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata @@ -52,7 +52,7 @@ To connect your dbt Cloud account to your GitHub account: ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. - + ## Personally authenticate with GitHub @@ -70,7 +70,7 @@ To connect a personal GitHub account: 2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". 3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. - + 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index e55552e2d86..316e6af0135 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -22,11 +22,11 @@ To connect your GitLab account: 2. Select **Linked Accounts** in the left menu. 3. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and you'll see that your account has been linked to your profile. @@ -52,7 +52,7 @@ For more detail, GitLab has a [guide for creating a Group Application](https://d In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. - + In GitLab, when creating your Group Application, input the following: @@ -67,7 +67,7 @@ Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cl The application form in GitLab should look as follows when completed: - + Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. @@ -76,7 +76,7 @@ If you're a Business Critical customer using [IP restrictions](/docs/cloud/secur ### Adding the GitLab OAuth application to dbt Cloud After you've created your GitLab application, you need to provide dbt Cloud information about the app. In dbt Cloud, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. - + In dbt Cloud, input the following values: @@ -92,7 +92,7 @@ Once the form is complete in dbt Cloud, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). @@ -103,7 +103,7 @@ To connect a personal GitLab account, dbt Cloud developers should navigate to Yo If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. - + Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 83846bb1f0b..2ccaba1ec4d 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -37,7 +37,7 @@ If you use GitHub, you can import your repo directly using [dbt Cloud's GitHub A - After adding this key, dbt Cloud will be able to read and write files in your dbt project. - Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) - + ## GitLab @@ -52,7 +52,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. - Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) - + ## BitBucket @@ -60,7 +60,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - Next, click the **Add key** button and paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your BitBucket repository. - + ## AWS CodeCommit @@ -109,17 +109,17 @@ If you use Azure DevOps and you are on the dbt Cloud Enterprise plan, you can im 2. We recommend using a dedicated service user for the integration to ensure that dbt Cloud's connection to Azure DevOps is not interrupted by changes to user permissions. - + 3. Next, click the **+ New Key** button to create a new SSH key for the repository. - + 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt Cloud for your repository. 5. After saving this SSH key, dbt Cloud will be able to read and write files in your Azure DevOps repository. - + ## Other git providers diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index 843371be6ea..b24ec577935 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -34,11 +34,11 @@ Many customers ask why they need to select Multitenant instead of Single tenant, 6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 7. Click **Register**. - + Here's what your app should look like before registering it: - + ## Add permissions to your new app @@ -51,7 +51,7 @@ Provide your new app access to Azure DevOps: 4. Select **Azure DevOps**. 5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - + ## Add another redirect URI @@ -63,7 +63,7 @@ You also need to add another redirect URI to your Azure AD application. This red `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. - + @@ -77,7 +77,7 @@ If you have already connected your Azure DevOps account to Active Directory, the 4. Select the directory you want to connect. 5. Click **Connect**. - + ## Add your Azure AD app to dbt Cloud @@ -91,7 +91,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo - **Application (client) ID:** Found in the Azure AD App. - **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation. - **Directory(tenant) ID:** Found in the Azure AD App. - + Your Azure AD app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). @@ -345,7 +345,7 @@ To connect the service user: 2. The admin should click **Link Azure Service User** in dbt Cloud. 3. The admin will be directed to Azure DevOps and must accept the Azure AD app's permissions. 4. Finally, the admin will be redirected to dbt Cloud, and the service user will be connected. - + Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index 774400529e9..7170ee95ebd 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -20,7 +20,7 @@ The dbt Cloud audit log stores all the events that occurred in your organization To access the audit log, click the gear icon in the top right, then click **Audit Log**. - + ## Understanding the audit log @@ -160,7 +160,7 @@ The audit log supports various events for different objects in dbt Cloud. You wi You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. - + ## Exporting logs @@ -171,7 +171,7 @@ You can use the audit log to export all historical audit results for security, c - **For events beyond 90 days** — Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization. - + ### Azure Single-tenant diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index a40bb006d06..610c97e8b74 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -17,11 +17,11 @@ If you have not yet configured SSO in dbt Cloud, refer instead to our setup guid The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. - + Alternatively, you can start the process from the **Settings** page in the **Single Sign-on** pane. Click the **Begin Migration** button to start. - + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). @@ -48,15 +48,15 @@ Below are sample steps to update. You must complete all of them to ensure uninte Here is an example of an updated SAML 2.0 setup in Okta. - + 2. Save the configuration, and your SAML settings will look something like this: - + 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ - + 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. @@ -68,17 +68,17 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - + 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - + 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. - + 4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. @@ -88,7 +88,7 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + ## Azure Active Directory @@ -98,15 +98,15 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Click **App Registrations** on the left side menu. - + 2. Select the proper **dbt Cloud** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** - + 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. - + 4. Navigate to the dbt Cloud environment and open the **Account Settings**. Click the **Single Sign-on** option from the left side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Azure AD. Toggle the **Enable New SSO Authentication** option and **Save**. _Once this option is enabled, it cannot be undone._ @@ -116,4 +116,4 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 63786f40bd8..adf1ff208cc 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -45,7 +45,7 @@ If you're on an Enterprise plan and have the correct [permissions](/docs/cloud/m - To add a user, go to **Account Settings**, select **Users** under **Teams**. Select [**Invite Users**](docs/cloud/manage-access/invite-users). For fine-grained permission configuration, refer to [Role based access control](/docs/cloud/manage-access/enterprise-permissions). - + @@ -66,14 +66,14 @@ To add a user in dbt Cloud, you must be an account owner or have admin privilege 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Billing**. 3. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. Leaving these blank won't allow you to save your changes. 4. Press **Update Payment Information** to save your changes. - + Now that you've updated your billing, you can now [invite users](/docs/cloud/manage-access/invite-users) to join your dbt Cloud account: @@ -87,13 +87,13 @@ To delete a user in dbt Cloud, you must be an account owner or have admin privil 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. - + If you are on a **Teams** plan and you're deleting users to reduce the number of billable seats, follow these steps to lower the license count to avoid being overcharged: @@ -102,7 +102,7 @@ If you are on a **Teams** plan and you're deleting users to reduce the number of 2. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. - + Great work! After completing these steps, your dbt Cloud user count and billing count should now be the same. @@ -130,7 +130,7 @@ to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. - ### Mapped configuration @@ -149,7 +149,7 @@ license. To assign Read-Only licenses to certain groups of users, create a new License Mapping for the Read-Only license type and include a comma separated list of IdP group names that should receive a Read-Only license at sign-in time. - Usage notes: diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index dcacda20deb..ac2d6258819 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -28,11 +28,11 @@ Role-Based Access Control (RBAC) is helpful for automatically assigning permissi 1. Click the gear icon to the top right and select **Account Settings**. From the **Team** section, click **Groups** - + 1. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. 2. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. 3. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - + 4. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index 21be7010a30..f79daebf45e 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -20,11 +20,11 @@ You must have proper permissions to invite new users: 1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. 2. From the left sidebar, select **Users**. - + 3. Click on **Invite Users**. - + 4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. @@ -40,7 +40,7 @@ dbt Cloud generates and sends emails from `support@getdbt.com` to the specified The email contains a link to create an account. When the user clicks on this they will be brought to one of two screens depending on whether SSO is configured or not. - + @@ -48,7 +48,7 @@ The email contains a link to create an account. When the user clicks on this the The default settings send the email, the user clicks the link, and is prompted to create their account: - + @@ -56,7 +56,7 @@ The default settings send the email, the user clicks the link, and is prompted t If SSO is configured for the environment, the user clicks the link, is brought to a confirmation screen, and presented with a link to authenticate against the company's identity provider: - + @@ -73,4 +73,4 @@ Once the user completes this process, their email and user information will popu * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index 87018b14d56..b0930af16f7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -28,7 +28,7 @@ To get started, you need to create a client ID and secret for [authentication](h In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: - + On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. @@ -46,7 +46,7 @@ Fill in the application, replacing `YOUR_ACCESS_URL` with the [appropriate Acces Then click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. - + @@ -59,7 +59,7 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client - add the client ID and secret from the BigQuery OAuth app under the **OAuth2.0 Settings** section - + ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: @@ -68,10 +68,10 @@ Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud us - Select **Credentials**. - choose your project from the list - select **Authenticate BigQuery Account** - + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. - + Select **Allow**. This redirects you back to dbt Cloud. You should now be an authenticated BigQuery user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index 679133b7844..8dcbb42ffa7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -60,7 +60,7 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie - select **Connection** to edit the connection details - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section - + ### Authenticating to Databricks (dbt Cloud IDE developer) @@ -72,6 +72,6 @@ Once the Databricks connection via OAuth is set up for a dbt Cloud project, each - Select `OAuth` as the authentication method, and click **Save** - Finalize by clicking the **Connect Databricks Account** button - + You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt Cloud. You should now be an authenticated Databricks user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index 5b9abb6058a..8e38a60dd27 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -68,7 +68,7 @@ from Enter the Client ID and Client Secret into dbt Cloud to complete the creation of your Connection. - + ### Authorize Developer Credentials @@ -76,7 +76,7 @@ Once Snowflake SSO is enabled, users on the project will be able to configure th ### SSO OAuth Flow Diagram - + Once a user has authorized dbt Cloud with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt Cloud application. dbt Cloud is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the dbt Cloud IDE on behalf of users. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 19779baf615..1e45de190f5 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -52,7 +52,7 @@ Client Secret for use in dbt Cloud. | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | - + 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. @@ -65,7 +65,7 @@ Client Secret for use in dbt Cloud. | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | - + 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! @@ -77,7 +77,7 @@ Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. - + ## Configuration in dbt Cloud @@ -99,7 +99,7 @@ Settings. Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. The `LOGIN-SLUG` must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. - + 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt Cloud with your work email address. If authentication is successful, you will be @@ -109,7 +109,7 @@ Settings. you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. - + If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index ba925fa2c24..3bb3f7165a3 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -97,23 +97,15 @@ You can use the instructions in this section to configure Okta as your identity 1. Log into your Okta account. Using the Admin dashboard, create a new app. - + -2. Select the following configurations: +1. Select the following configurations: - **Platform**: Web - **Sign on method**: SAML 2.0 -3. Click **Create** to continue the setup process. +2. Click **Create** to continue the setup process. - + ### Configure the Okta application @@ -131,11 +123,7 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un 2. Click **Next** to continue. - + ### Configure SAML Settings @@ -145,12 +133,12 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un * **Audience URI (SP Entity ID)**: `urn:auth0::` * **Relay State**: `` - + -2. Map your organization's Okta User and Group Attributes to the format that +1. Map your organization's Okta User and Group Attributes to the format that dbt Cloud expects by using the Attribute Statements and Group Attribute Statements forms. -3. The following table illustrates expected User Attribute Statements: +1. The following table illustrates expected User Attribute Statements: | Name | Name format | Value | Description | | -------------- | ----------- | -------------------- | -------------------------- | @@ -158,7 +146,7 @@ dbt Cloud expects by using the Attribute Statements and Group Attribute Statemen | `first_name` | Unspecified | `user.firstName` | _The user's first name_ | | `last_name` | Unspecified | `user.lastName` | _The user's last name_ | -4. The following table illustrates expected **Group Attribute Statements**: +2. The following table illustrates expected **Group Attribute Statements**: | Name | Name format | Filter | Value | Description | | -------- | ----------- | ------------- | ----- | ------------------------------------- | @@ -172,13 +160,9 @@ only returns 100 groups for each user, so if your users belong to more than 100 IdP groups, you will need to use a more restrictive filter**. Please contact support if you have any questions. - + -5. Click **Next** to continue. +1. Click **Next** to continue. ### Finish Okta setup @@ -187,11 +171,7 @@ support if you have any questions. 3. Click **Finish** to finish setting up the app. - + ### View setup instructions @@ -199,19 +179,11 @@ app. 2. In the steps below, you'll supply these values in your dbt Cloud Account Settings to complete the integration between Okta and dbt Cloud. - + - + -3. After creating the Okta application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) +1. After creating the Okta application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) section to complete the integration. ## Google integration @@ -426,11 +398,11 @@ To complete setup, follow the steps below in dbt Cloud: | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | Slug | Enter your desired login slug. | - -4. Click **Save** to complete setup for the SAML 2.0 integration. -5. After completing the setup, you can navigate to the URL generated for your account's _slug_ to test logging in with your identity provider. Additionally, users added the the SAML 2.0 app will be able to log in to dbt Cloud from the IdP directly. + + +1. Click **Save** to complete setup for the SAML 2.0 integration. +2. After completing the setup, you can navigate to the URL generated for your account's _slug_ to test logging in with your identity provider. Additionally, users added the the SAML 2.0 app will be able to log in to dbt Cloud from the IdP directly. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index b4954955c8c..938587d59b3 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -24,7 +24,7 @@ Once you configure SSO, even partially, you cannot disable or revert it. When yo The diagram below explains the basic process by which users are provisioned in dbt Cloud upon logging in with SSO. - + #### Diagram Explanation diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index 034b3a6c144..a0206ca038d 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -71,6 +71,6 @@ Once you are done adding all your ranges, IP restrictions can be enabled by sele Once enabled, when someone attempts to access dbt Cloud from a restricted IP, they will encounter one of the following messages depending on whether they use email & password or SSO login. - + - + diff --git a/website/docs/docs/cloud/secure/postgres-privatelink.md b/website/docs/docs/cloud/secure/postgres-privatelink.md index ef07d15c128..95749bf913b 100644 --- a/website/docs/docs/cloud/secure/postgres-privatelink.md +++ b/website/docs/docs/cloud/secure/postgres-privatelink.md @@ -49,13 +49,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index c42c703556b..da5312876fb 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -23,17 +23,17 @@ While Redshift Serverless does support Redshift-managed type VPC endpoints, this 1. On the running Redshift cluster, select the **Properties** tab. - + 2. In the **Granted accounts** section, click **Grant access**. - + 3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. - + 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): @@ -62,14 +62,14 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Standard Redshift** - Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. - + - There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). - **Redshift Serverless** - To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. - + - From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. @@ -85,13 +85,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index dd046259e4e..bc8f30a5566 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -27,7 +27,7 @@ Users connecting to Snowflake using SSO over a PrivateLink connection from dbt C - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. - + 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. diff --git a/website/docs/docs/cloud/secure/vcs-privatelink.md b/website/docs/docs/cloud/secure/vcs-privatelink.md index 13bb97dd6cd..3007626190a 100644 --- a/website/docs/docs/cloud/secure/vcs-privatelink.md +++ b/website/docs/docs/cloud/secure/vcs-privatelink.md @@ -15,7 +15,7 @@ You will learn, at a high level, the resources necessary to implement this solut ## PrivateLink connection overview - + ### Required resources for creating a connection @@ -56,7 +56,7 @@ To complete the connection, dbt Labs must now provision a VPC Endpoint to connec - VPC Endpoint Service name: - + - **DNS configuration:** If the connection to the VCS service requires a custom domain and/or URL for TLS, a private hosted zone can be configured by the dbt Labs Infrastructure team in the dbt Cloud private network. For example: - **Private hosted zone:** `examplecorp.com` @@ -66,7 +66,7 @@ To complete the connection, dbt Labs must now provision a VPC Endpoint to connec When you have been notified that the resources are provisioned within the dbt Cloud environment, you must accept the endpoint connection (unless the VPC Endpoint Service is set to auto-accept connection requests). Requests can be accepted through the AWS console, as seen below, or through the AWS CLI. - + Once you accept the endpoint connection request, you can use the PrivateLink endpoint in dbt Cloud. @@ -77,6 +77,6 @@ Once dbt confirms that the PrivateLink integration is complete, you can use it i 2. Select the configured endpoint from the drop down list. 3. Click **Save**. - + - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index e104ea8640c..b183735da76 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -52,7 +52,7 @@ You configure project documentation to generate documentation when the job you s To generate documentation in the dbt Cloud IDE, run the `dbt docs generate` command in the Command Bar in the dbt Cloud IDE. This command will generate the Docs for your dbt project as it exists in development in your IDE session. - + After generating your documentation, you can click the **Book** icon above the file tree, to see the latest version of your documentation rendered in a new browser window. @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index 1a989806851..b6636a84eee 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -29,7 +29,7 @@ Importantly, dbt also provides a way to add **descriptions** to models, columns, Here's an example docs site: - + ## Adding descriptions to your project To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: @@ -177,17 +177,17 @@ up to page views and sessions. ## Navigating the documentation site Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this: - + Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring. - + In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](/reference/node-selection/syntax). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. - + ## Deploying the documentation site diff --git a/website/docs/docs/collaborate/explore-multiple-projects.md b/website/docs/docs/collaborate/explore-multiple-projects.md index 3be35110a37..9fd4be3bfae 100644 --- a/website/docs/docs/collaborate/explore-multiple-projects.md +++ b/website/docs/docs/collaborate/explore-multiple-projects.md @@ -11,12 +11,12 @@ The resource-level lineage graph for a given project displays the cross-project When you view an upstream (parent) project, its public models display a counter icon in the upper right corner indicating how many downstream (child) projects depend on them. Selecting a model reveals the lineage indicating the projects dependent on that model. These counts include all projects listing the upstream one as a dependency in its `dependencies.yml`, even without a direct `{{ ref() }}`. Selecting a project node from a public model opens its detailed lineage graph, which is subject to your [permission](/docs/cloud/manage-access/enterprise-permissions). - + When viewing a downstream (child) project that imports and refs public models from upstream (parent) projects, public models will show up in the lineage graph and display an icon on the graph edge that indicates what the relationship is to a model from another project. Hovering over this icon indicates the specific dbt Cloud project that produces that model. Double-clicking on a model from another project opens the resource-level lineage graph of the parent project, which is subject to your permissions. - + ## Explore the project-level lineage graph diff --git a/website/docs/docs/collaborate/git/managed-repository.md b/website/docs/docs/collaborate/git/managed-repository.md index db8e9840ccd..6112b84d4c6 100644 --- a/website/docs/docs/collaborate/git/managed-repository.md +++ b/website/docs/docs/collaborate/git/managed-repository.md @@ -13,7 +13,7 @@ To set up a project with a managed repository: 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. - + dbt Cloud will host and manage this repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt Cloud at any time. diff --git a/website/docs/docs/collaborate/git/merge-conflicts.md b/website/docs/docs/collaborate/git/merge-conflicts.md index c3c19b1e2a1..133a096da9c 100644 --- a/website/docs/docs/collaborate/git/merge-conflicts.md +++ b/website/docs/docs/collaborate/git/merge-conflicts.md @@ -35,9 +35,9 @@ The dbt Cloud IDE will display: - The file name colored in red in the **Changes** section, with a warning icon. - If you press commit without resolving the conflict, the dbt Cloud IDE will prompt a pop up box with a list which files need to be resolved. - + - + ## Resolve merge conflicts @@ -51,7 +51,7 @@ You can seamlessly resolve merge conflicts that involve competing line changes i 6. Repeat this process for every file that has a merge conflict. - + :::info Edit conflict files - If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
@@ -67,6 +67,6 @@ When you've resolved all the merge conflicts, the last step would be to commit t 3. The dbt Cloud IDE will return to its normal state and you can continue developing! - + - + diff --git a/website/docs/docs/collaborate/git/pr-template.md b/website/docs/docs/collaborate/git/pr-template.md index ddb4948dad9..b85aa8a0d51 100644 --- a/website/docs/docs/collaborate/git/pr-template.md +++ b/website/docs/docs/collaborate/git/pr-template.md @@ -9,7 +9,7 @@ open a new Pull Request for the code changes. To enable this functionality, ensu that a PR Template URL is configured in the Repository details page in your Account Settings. If this setting is blank, the IDE will prompt users to merge the changes directly into their default branch. - + ### PR Template URL by git provider diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index 7ef675b4e1e..f4dcb5970dd 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -23,11 +23,11 @@ You can pinpoint areas for performance enhancement by using the Performance over Each data point links to individual models in Explorer. - + You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/collaborate/project-recommendations.md b/website/docs/docs/collaborate/project-recommendations.md index e6263a875fc..97585f0cb98 100644 --- a/website/docs/docs/collaborate/project-recommendations.md +++ b/website/docs/docs/collaborate/project-recommendations.md @@ -21,7 +21,7 @@ The Recommendations overview page includes two top-level metrics measuring the t - **Model test coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them. - **Model documentation coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with a description. - + ## List of rules @@ -45,6 +45,6 @@ The Recommendations overview page includes two top-level metrics measuring the t Models, sources and exposures each also have a Recommendations tab on their resource details page, with the specific recommendations that correspond to that resource: - + diff --git a/website/docs/docs/dbt-cloud-apis/discovery-api.md b/website/docs/docs/dbt-cloud-apis/discovery-api.md index 747128cf7bc..983674faedf 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-api.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-api.md @@ -9,7 +9,7 @@ By leveraging the metadata in dbt Cloud, you can create systems for data monitor You can access the Discovery API through [ad hoc queries](/docs/dbt-cloud-apis/discovery-querying), custom applications, a wide range of [partner ecosystem integrations](https://www.getdbt.com/product/integrations/) (like BI/analytics, catalog and governance, and quality and observability), and by using dbt Cloud features like [model timing](/docs/deploy/run-visibility#model-timing) and [dashboard status tiles](/docs/deploy/dashboard-status-tiles). - + You can query the dbt Cloud metadata: @@ -36,7 +36,7 @@ Use the API to look at historical information like model build time to determine You can use, for example, the [model timing](/docs/deploy/run-visibility#model-timing) tab to help identify and optimize bottlenecks in model builds: - + @@ -54,7 +54,7 @@ Use the API to find and understand dbt assets in integrated tools using informat Data producers must manage and organize data for stakeholders, while data consumers need to quickly and confidently analyze data on a large scale to make informed decisions that improve business outcomes and reduce organizational overhead. The API is useful for discovery data experiences in catalogs, analytics, apps, and machine learning (ML) tools. It can help you understand the origin and meaning of datasets for your analysis. - + @@ -68,7 +68,7 @@ Use the API to review who developed the models and who uses them to help establi Use the API to review dataset changes and uses by examining exposures, lineage, and dependencies. From the investigation, you can learn how to define and build more effective dbt projects. For more details, refer to [Development](/docs/dbt-cloud-apis/discovery-use-cases-and-examples#development). - + diff --git a/website/docs/docs/dbt-cloud-apis/discovery-querying.md b/website/docs/docs/dbt-cloud-apis/discovery-querying.md index 35c092adb4b..4e9c9cf051c 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-querying.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-querying.md @@ -92,14 +92,14 @@ Refer to the [Apollo explorer documentation](https://www.apollographql.com/docs/
- + 1. Run your query by clicking the blue query button in the top right of the **Operation** editor (to the right of the query). You should see a successful query response on the right side of the explorer. - + ### Fragments diff --git a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md index 8efb1ec0d37..c4ddb3fbc5f 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md @@ -25,7 +25,7 @@ For performance use cases, people typically query the historical or latest appli It’s helpful to understand how long it takes to build models (tables) and tests to execute during a dbt run. Longer model build times result in higher infrastructure costs and fresh data arriving later to stakeholders. Analyses like these can be in observability tools or ad-hoc queries, like in a notebook. - +
Example query with code @@ -158,10 +158,10 @@ plt.show() Plotting examples: - + - +
@@ -687,7 +687,7 @@ query ($environmentId: BigInt!, $first: Int!) { Lineage, enabled by the `ref` function, is at the core of dbt. Understanding lineage provides many benefits, such as understanding the structure and relationships of datasets (and metrics) and performing impact-and-root-cause analyses to resolve or present issues given changes to definitions or source data. With the Discovery API, you can construct lineage using the `parents` nodes or its `children` and query the entire upstream lineage using `ancestors`. - +
Example query with code @@ -1056,7 +1056,7 @@ For development use cases, people typically query the historical or latest defin ### How is this model or metric used in downstream tools? [Exposures](/docs/build/exposures) provide a method to define how a model or metric is actually used in dashboards and other analytics tools and use cases. You can query an exposure’s definition to see how project nodes are used and query its upstream lineage results to understand the state of the data used in it, which powers use cases like a freshness and quality status tile. - +
diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index b0b5fbd6cfe..a5a8a6c4807 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -110,7 +110,7 @@ On July 18, 2023, dbt Labs made critical infrastructure changes to service accou To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is _on or before_ July 18, 2023. - + 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt Cloud by clicking the **trash can icon**. _Only take this action after the new token is in place to avoid service disruptions_. diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index 77e536b12a5..5734f8ba35a 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -14,7 +14,7 @@ permissions of the user the that they were created for. You can find your User API token in the Profile page under the `API Access` label. - + ## FAQs diff --git a/website/docs/docs/dbt-cloud-environments.md b/website/docs/docs/dbt-cloud-environments.md index 522a354be97..01d24fec9b9 100644 --- a/website/docs/docs/dbt-cloud-environments.md +++ b/website/docs/docs/dbt-cloud-environments.md @@ -38,7 +38,7 @@ To create a new dbt Cloud development environment: To use the dbt Cloud IDE or dbt Cloud CLI, each developer will need to set up [personal development credentials](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#access-the-cloud-ide) to your warehouse connection in their **Profile Settings**. This allows you to set separate target information and maintain individual credentials to connect to your warehouse. - + ## Deployment environment diff --git a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md index c0236a30783..57cd3cc37d3 100644 --- a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md +++ b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md @@ -11,5 +11,5 @@ By default, dbt parses all the files in your project at the beginning of every d To learn more, refer to [Partial parsing](/docs/deploy/deploy-environments#partial-parsing). - + diff --git a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md index 25791b66fb1..80bff71d176 100644 --- a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md +++ b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md @@ -13,4 +13,4 @@ To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#exten The **Extended Atrributes** text box is available from your environment's settings page: - + diff --git a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md index eff15e96cfd..b7b0f0f5325 100644 --- a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md +++ b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md @@ -11,4 +11,4 @@ Now available for dbt Cloud Enterprise plans is a new option to enable Git repos To learn more, refer to [Repo caching](/docs/deploy/deploy-environments#git-repository-caching). - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md index 20e56879940..f4226627792 100644 --- a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md @@ -12,4 +12,4 @@ Previously in dbt Cloud, you could only rerun an errored job from start but now You can view which job failed to complete successully, which command failed in the run step, and choose how to rerun it. To learn more, refer to [Retry jobs](/docs/deploy/retry-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md index a81abec5d42..4cd0185a528 100644 --- a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md +++ b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md @@ -20,7 +20,7 @@ It aims to bring the best of modeling and semantics to downstream applications b - dbt Cloud [multi-tenant regional](/docs/cloud/about-cloud/regions-ip-addresses) support for North America, EMEA, and APAC. Single-tenant support coming soon. - Use the APIs to call an export (a way to build tables in your data platform), then access them in your preferred BI tool. Starting from dbt v1.7 or higher, you will be able to schedule exports as part of your dbt job. - + The dbt Semantic Layer is available to [dbt Cloud Team or Enterprise](https://www.getdbt.com/) multi-tenant plans on dbt v1.6 or higher. - Team and Enterprise customers can use 1,000 Queried Metrics per month for no additional cost on a limited trial basis, subject to reasonable use limitations. Refer to [Billing](/docs/cloud/billing#what-counts-as-a-queried-metric) for more information. diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md index a8ae1ade65b..434d24edcbf 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md @@ -11,7 +11,7 @@ dbt Cloud now has two distinct job types: [deploy jobs](/docs/deploy/deploy-jobs With two types of jobs, instead of one generic type, we can better guide you through the setup flow. Best practices are built into the default settings so you can go from curious to being set up in seconds. - + And, we now have more efficient state comparisons on CI checks: never waste a build or test on code that hasn’t been changed. We now diff between the Git pull request (PR) code and what’s running in production more efficiently with the introduction of deferral to an environment versus a job. To learn more, refer to [Continuous integration in dbt Cloud](/docs/deploy/continuous-integration). @@ -39,4 +39,4 @@ Below is a comparison table that describes how deploy jobs and CI jobs behave di To check for the job type, review your CI jobs in dbt Cloud's [Run History](/docs/deploy/run-visibility#run-history) and check for the **CI Job** tag below the job name. If it doesn't have this tag, it was misclassified and you need to re-create the job. - + diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md index 0b588376c34..dc2cdb63748 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md @@ -12,4 +12,4 @@ Previously, when dbt Labs released a new [version](/docs/dbt-versions/core#how-d To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md index 5cf1f97ff25..b1d152fd91e 100644 --- a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md +++ b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md @@ -15,11 +15,11 @@ Read more on how you can experience faster run start execution and how enterpris The Scheduler takes care of preparing each dbt Cloud job to run in your cloud data platform. This [prep](/docs/deploy/job-scheduler#scheduler-queue) involves readying a Kubernetes pod with the right version of dbt installed, setting environment variables, loading data platform credentials, and git provider authorization, amongst other environment-setting tasks. Only after the environment is set up, can dbt execution begin. We display this time to the user in dbt Cloud as “prep time”. - + For all its strengths, Kubernetes has challenges, especially with pod management impacting run execution time. We’ve rebuilt our scheduler by ensuring faster job execution with a ready pool of pods to execute customers’ jobs. This means you won't experience long prep times at the top of the hour, and we’re determined to keep runs starting near instantaneously. Don’t just take our word, review the data yourself. - + Jobs scheduled at the top of the hour used to take over 106 seconds to prepare because of the volume of runs the scheduler has to process. Now, even with increased runs, we have reduced prep time to 27 secs (at a maximum) — a 75% speed improvement for runs at peak traffic times! diff --git a/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md b/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md index e99d1fe3e0b..35a202cf3ea 100644 --- a/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md @@ -17,10 +17,10 @@ For more info, read [Lint and format your code](/docs/cloud/dbt-cloud-ide/lint-f - + - + - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md index 1aabe517076..38b017baa30 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md @@ -16,4 +16,4 @@ Highlights include: - Cleaner look and feel with iconography - Helpful tool tips - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md index 050fd8339a2..86ca532c154 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md @@ -14,7 +14,7 @@ dbt Labs is making a change to the metadata retrieval policy for Run History in Specifically, all `GET` requests to the dbt Cloud [Runs endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Runs) will return information on runs, artifacts, logs, and run steps only for the past 365 days. Additionally, the run history displayed in the dbt Cloud UI will only show runs for the past 365 days. - + We will retain older run history in cold storage and can make it available to customers who reach out to our Support team. To request older run history info, contact the Support team at [support@getdbt.com](mailto:support@getdbt.com) or use the dbt Cloud application chat by clicking the `?` icon in the dbt Cloud UI. diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md index d4d299b1d36..0bc4b76d0fc 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md @@ -8,7 +8,7 @@ tags: [May-2023, Scheduler] New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. - + Highlights include: diff --git a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md index bdc89b4abde..9ceda7749cd 100644 --- a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md +++ b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md @@ -10,5 +10,5 @@ To help save compute time, new jobs will no longer be triggered to run by defaul For more information, refer to [Deploy jobs](/docs/deploy/deploy-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md index 2d0488d4488..41e1a5265ca 100644 --- a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md +++ b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md @@ -13,4 +13,4 @@ Large DAGs can take a long time (10 or more seconds, if not minutes) to render a The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md index 307786c6b85..90e6ac72fea 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md @@ -10,4 +10,4 @@ We fixed an issue where a spotty internet connection could cause the “IDE sess We updated the health check logic so it now excludes client-side connectivity issues from the IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md index 9ff5986b4da..46c1f4bbd15 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md @@ -9,4 +9,4 @@ tags: [v1.1.46, March-02-2022] dbt Cloud now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt Cloud waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt Cloud to ready your job to run in your cloud data warehouse. - + diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index e46294029ec..9f7382d6863 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -9,7 +9,7 @@ In dbt Cloud, both jobs and environments are configured to use a specific versio Navigate to the settings page of an environment, then click **edit**. Click the **dbt Version** dropdown bar and make your selection. From this list, you can select an available version of Core to associate with this environment. - + Be sure to save your changes before navigating away. @@ -17,7 +17,7 @@ Be sure to save your changes before navigating away. Each job in dbt Cloud can be configured to inherit parameters from the environment it belongs to. - + The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. @@ -281,7 +281,7 @@ Once you have your project compiling and running on the latest version of dbt in - + Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/artifacts.md b/website/docs/docs/deploy/artifacts.md index 9b3ae71e79c..7ecc05355a0 100644 --- a/website/docs/docs/deploy/artifacts.md +++ b/website/docs/docs/deploy/artifacts.md @@ -10,11 +10,11 @@ When running dbt jobs, dbt Cloud generates and saves *artifacts*. You can use th While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection in the **Project details** page. To access this page, click the gear icon in the upper right, select **Account Settings**, select your project, and click **Edit** in the lower right. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. - + If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. - + When you add a production job to a project, dbt Cloud updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. @@ -25,10 +25,10 @@ When set up, dbt Cloud updates the **Documentation** link in the header tab so i Note that both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation for the project-level documentation to be populated or updated. - + ### Source Freshness As with Documentation, configuring a job for the Source Freshness artifact setting also updates the Data Sources link under **Deploy**. The new link points to the latest Source Freshness report for the selected job. - + diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 149a6951fdc..fd4da3379b7 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -39,7 +39,7 @@ To make CI job creation easier, many options on the **CI job** page are set to d - **Generate docs on run** — Enable this option if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this job runs. This option is disabled by default since most teams do not want to test doc generation on every CI check. - + 4. (optional) Options in the **Advanced Settings** section: - **Environment Variables** — Define [environment variables](/docs/build/environment-variables) to customize the behavior of your project when this CI job runs. You can specify that a CI job is running in a _Staging_ or _CI_ environment by setting an environment variable and modifying your project code to behave differently, depending on the context. It's common for teams to process only a subset of data for CI runs, using environment variables to branch logic in their dbt project code. @@ -49,7 +49,7 @@ To make CI job creation easier, many options on the **CI job** page are set to d - **Threads** — By default, it’s set to 4 [threads](/docs/core/connect-data-platform/connection-profiles#understanding-threads). Increase the thread count to increase model execution concurrency. - **Run source freshness** — Enable this option to invoke the `dbt source freshness` command before running this CI job. Refer to [Source freshness](/docs/deploy/source-freshness) for more details. - + ## Trigger a CI job with the API @@ -77,15 +77,15 @@ The green checkmark means the dbt build and tests were successful. Clicking on t ### GitHub pull request example - + ### GitLab pull request example - + ### Azure DevOps pull request example - + ## Troubleshooting @@ -117,10 +117,10 @@ If you're experiencing any issues, review some of the common questions and answe First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- +

Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- +

Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job.
diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index 0f87965aada..3fe50922bfd 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -6,7 +6,7 @@ description: "You can set up continuous integration (CI) checks to test every si To implement a continuous integration (CI) workflow in dbt Cloud, you can set up automation that tests code changes by running [CI jobs](/docs/deploy/ci-jobs) before merging to production. dbt Cloud tracks the state of what’s running in your production environment so, when you run a CI job, only the modified data assets in your pull request (PR) and their downstream dependencies are built and tested in a staging schema. You can also view the status of the CI checks (tests) directly from within the PR; this information is posted to your Git provider as soon as a CI job completes. Additionally, you can enable settings in your Git provider that allow PRs only with successful CI checks be approved for merging. - + Using CI helps: @@ -20,7 +20,7 @@ When you [set up CI jobs](/docs/deploy/ci-jobs#set-up-ci-jobs), dbt Cloud liste dbt Cloud builds and tests the models affected by the code change in a temporary schema, unique to the PR. This process ensures that the code builds without error and that it matches the expectations as defined by the project's dbt tests. The unique schema name follows the naming convention `dbt_cloud_pr__` (for example, `dbt_cloud_pr_1862_1704`) and can be found in the run details for the given run, as shown in the following image: - + When the CI run completes, you can view the run status directly from within the pull request. dbt Cloud updates the pull request in GitHub, GitLab, or Azure DevOps with a status message indicating the results of the run. The status message states whether the models and tests ran successfully or not. @@ -48,5 +48,5 @@ Below describes the conditions when CI checks are run concurrently and when they When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the latest commit and cancels any CI run that is (now) stale and still in flight. This can happen when you’re pushing new commits while a CI build is still in process and not yet done. By cancelling runs in a safe and deliberate way, dbt Cloud helps improve productivity and reduce data platform spend on wasteful CI runs. - + diff --git a/website/docs/docs/deploy/dashboard-status-tiles.md b/website/docs/docs/deploy/dashboard-status-tiles.md index 67aa1a93c33..2ba93606204 100644 --- a/website/docs/docs/deploy/dashboard-status-tiles.md +++ b/website/docs/docs/deploy/dashboard-status-tiles.md @@ -9,11 +9,11 @@ In dbt Cloud, the [Discovery API](/docs/dbt-cloud-apis/discovery-api) can power ## Functionality The dashboard status tile looks like this: - + The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: - + Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. @@ -56,11 +56,11 @@ Note that Mode has also built its own [integration](https://mode.com/get-dbt/) w Looker does not allow you to directly embed HTML and instead requires creating a [custom visualization](https://docs.looker.com/admin-options/platform/visualizations). One way to do this for admins is to: - Add a [new visualization](https://fishtown.looker.com/admin/visualizations) on the visualization page for Looker admins. You can use [this URL](https://metadata.cloud.getdbt.com/static/looker-viz.js) to configure a Looker visualization powered by the iFrame. It will look like this: - + - Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, jobID, and token relevant to that dashboard. - + ### Tableau Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: @@ -79,7 +79,7 @@ https://metadata.cloud.getdbt.com/exposure-tile?name=&jobId= + ### Sigma @@ -99,4 +99,4 @@ https://metadata.au.dbt.com/exposure-tile?name=&jobId=&to ``` ::: - + diff --git a/website/docs/docs/deploy/deploy-environments.md b/website/docs/docs/deploy/deploy-environments.md index 650fdb1c28a..f9f15a25aa2 100644 --- a/website/docs/docs/deploy/deploy-environments.md +++ b/website/docs/docs/deploy/deploy-environments.md @@ -26,13 +26,13 @@ import CloudEnvInfo from '/snippets/_cloud-environments-info.md'; To create a new dbt Cloud development environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. - + ### Set as production environment In dbt Cloud, each project can have one designated deployment environment, which serves as its production environment. This production environment is _essential_ for using features like dbt Explorer and cross-project references. It acts as the source of truth for the project's production state in dbt Cloud. - + ### Semantic Layer @@ -65,7 +65,7 @@ This section will not appear if you are using Redshift, as all values are inferr
- + #### Editable fields @@ -89,7 +89,7 @@ This section will not appear if you are using Spark, as all values are inferred
- + #### Editable fields @@ -108,7 +108,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -120,7 +120,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -132,7 +132,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -151,7 +151,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -161,7 +161,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -172,7 +172,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index cee6e245359..3a3dbebd70e 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -38,7 +38,7 @@ You can create a deploy job and configure it to run on [scheduled days and times - **Timing** — Specify whether to [schedule](#schedule-days) the deploy job using **Frequency** that runs the job at specific times of day, **Specific Intervals** that runs the job every specified number of hours, or **Cron Schedule** that runs the job specified using [cron syntax](#custom-cron-schedule). - **Days of the Week** — By default, it’s set to every day when **Frequency** or **Specific Intervals** is chosen for **Timing**. - + 5. (optional) Options in the **Advanced Settings** section: - **Environment Variables** — Define [environment variables](/docs/build/environment-variables) to customize the behavior of your project when the deploy job runs. @@ -53,7 +53,7 @@ You can create a deploy job and configure it to run on [scheduled days and times - **dbt Version** — By default, it’s set to inherit the [dbt version](/docs/dbt-versions/core) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior. - **Threads** — By default, it’s set to 4 [threads](/docs/core/connect-data-platform/connection-profiles#understanding-threads). Increase the thread count to increase model execution concurrency. - + ### Schedule days @@ -80,7 +80,7 @@ dbt Cloud uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordi To fully customize the scheduling of your job, choose the **Custom cron schedule** option and use the cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. - + Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. diff --git a/website/docs/docs/deploy/deployment-overview.md b/website/docs/docs/deploy/deployment-overview.md index 29934663544..bf55420918c 100644 --- a/website/docs/docs/deploy/deployment-overview.md +++ b/website/docs/docs/deploy/deployment-overview.md @@ -104,12 +104,12 @@ Learn how to use dbt Cloud's features to help your team ship timely and quality - + - + - + diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index cca2368f38a..64fcb1dadae 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -19,8 +19,8 @@ If your organization is using [Airflow](https://airflow.apache.org/), there are Installing the [dbt Cloud Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt Cloud jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt Cloud. - - + + @@ -71,7 +71,7 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). - + diff --git a/website/docs/docs/deploy/job-commands.md b/website/docs/docs/deploy/job-commands.md index 26fe1931db6..aa49d638e2c 100644 --- a/website/docs/docs/deploy/job-commands.md +++ b/website/docs/docs/deploy/job-commands.md @@ -29,7 +29,7 @@ Every job invocation automatically includes the [`dbt deps`](/reference/commands **Job outcome** — During a job run, the built-in commands are "chained" together. This means if one of the run steps in the chain fails, then the next commands aren't executed, and the entire job fails with an "Error" job status. - + ### Checkbox commands diff --git a/website/docs/docs/deploy/job-notifications.md b/website/docs/docs/deploy/job-notifications.md index 548e34fc2f3..4166cf73da6 100644 --- a/website/docs/docs/deploy/job-notifications.md +++ b/website/docs/docs/deploy/job-notifications.md @@ -23,7 +23,7 @@ You can receive email alerts about jobs by configuring the dbt Cloud email notif If you're an account admin, you can choose a different email address to receive notifications. Select the **Notification email** dropdown and choose another address from the list. The list includes **Internal Users** with access to the account and **External Emails** that have been added. - To add an external email address, select the **Notification email** dropdown and choose **Add external email**. After you add the external email, it becomes available for selection in the **Notification email** dropdown list. External emails can be addresses that are outside of your dbt Cloud account and also for third-party integrations like [channels in Microsoft Teams](https://support.microsoft.com/en-us/office/tip-send-email-to-a-channel-2c17dbae-acdf-4209-a761-b463bdaaa4ca) and [PagerDuty email integration](https://support.pagerduty.com/docs/email-integration-guide). - + 1. Select the **Environment** for the jobs you want to receive notifications about from the dropdown. @@ -35,7 +35,7 @@ You can receive email alerts about jobs by configuring the dbt Cloud email notif To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. - + ### Unsubscribe from email notifications 1. From the gear menu, choose **Notification settings**. @@ -75,7 +75,7 @@ Any account admin can edit the Slack notifications but they'll be limited to con To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. - + ### Disable the Slack integration diff --git a/website/docs/docs/deploy/job-scheduler.md b/website/docs/docs/deploy/job-scheduler.md index 7a4cd740804..b4ba711643c 100644 --- a/website/docs/docs/deploy/job-scheduler.md +++ b/website/docs/docs/deploy/job-scheduler.md @@ -50,7 +50,7 @@ If there is an available run slot and there isn't an actively running instance o Together, **wait time** plus **prep time** is the total time a run spends in the queue (or **Time in queue**). - + ### Treatment of CI jobs When compared to deployment jobs, the scheduler behaves differently when handling [continuous integration (CI) jobs](/docs/deploy/continuous-integration). It queues a CI job to be processed when it's triggered to run by a Git pull request, and the conditions the scheduler checks to determine if the run can start executing are also different: @@ -80,7 +80,7 @@ The dbt Cloud scheduler prevents too many job runs from clogging the queue by ca The scheduler prevents queue clog by canceling runs that aren't needed, ensuring there is only one run of the job in the queue at any given time. If a newer run is queued, the scheduler cancels any previously queued run for that job and displays an error message. - + To prevent over-scheduling, users will need to take action by either refactoring the job so it runs faster or modifying its [schedule](/docs/deploy/deploy-jobs#schedule-days). diff --git a/website/docs/docs/deploy/monitor-jobs.md b/website/docs/docs/deploy/monitor-jobs.md index 45156bb341c..98fe61b4224 100644 --- a/website/docs/docs/deploy/monitor-jobs.md +++ b/website/docs/docs/deploy/monitor-jobs.md @@ -20,11 +20,11 @@ This portion of our documentation will go over dbt Cloud's various capabilities - + - + - + diff --git a/website/docs/docs/deploy/retry-jobs.md b/website/docs/docs/deploy/retry-jobs.md index beefb35379e..db703ff6a38 100644 --- a/website/docs/docs/deploy/retry-jobs.md +++ b/website/docs/docs/deploy/retry-jobs.md @@ -23,7 +23,7 @@ If your dbt job run completed with a status of **Error**, you can rerun it from If you chose to rerun from the failure point, a **Rerun failed steps** modal opens. The modal lists the run steps that will be invoked: the failed step and any skipped steps. To confirm these run steps, click **Rerun from failure**. The job reruns from the failed command in the previously failed run. A banner at the top of the **Run Summary** tab captures this with the message, "This run resumed execution from last failed step". - + ## Related content - [Retry a failed run for a job](/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job) API endpoint diff --git a/website/docs/docs/deploy/run-visibility.md b/website/docs/docs/deploy/run-visibility.md index ff9abfa5b0b..01e5e591b4e 100644 --- a/website/docs/docs/deploy/run-visibility.md +++ b/website/docs/docs/deploy/run-visibility.md @@ -17,13 +17,13 @@ dbt Cloud developers can access their run history for the last 365 days through We limit self-service retrieval of run history metadata to 365 days to improve dbt Cloud's performance. For more info on the run history retrieval change, refer to [Older run history retrieval change](/docs/dbt-versions/release-notes/May-2023/run-history-endpoint). - + ## Access logs You can view or download in-progress and historical logs for your dbt runs. This makes it easier for the team to debug errors more efficiently. - + ## Model timing > Available on [multi-tenant](/docs/cloud/about-cloud/regions-ip-addresses) dbt Cloud accounts on the [Team or Enterprise plans](https://www.getdbt.com/pricing/). @@ -32,4 +32,4 @@ The model timing dashboard on dbt Cloud displays the composition, order, and tim You can find the dashboard on the **Run Overview** page. - + diff --git a/website/docs/docs/deploy/source-freshness.md b/website/docs/docs/deploy/source-freshness.md index 2f9fe6bc007..3c4866cd084 100644 --- a/website/docs/docs/deploy/source-freshness.md +++ b/website/docs/docs/deploy/source-freshness.md @@ -6,7 +6,7 @@ description: "Validate that data freshness meets expectations and alert if stale dbt Cloud provides a helpful interface around dbt's [source data freshness](/docs/build/sources#snapshotting-source-data-freshness) calculations. When a dbt Cloud job is configured to snapshot source data freshness, dbt Cloud will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. - + ### Enabling source freshness snapshots @@ -15,7 +15,7 @@ dbt Cloud provides a helpful interface around dbt's [source data freshness](/doc - Select the **Generate docs on run** checkbox to automatically [generate project docs](/docs/collaborate/build-and-view-your-docs#set-up-a-documentation-job). - Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. - + To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](/docs/build/sources#snapshotting-source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. @@ -27,7 +27,7 @@ Review the following options and outcomes: | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt Cloud will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | - + ### Source freshness snapshot frequency diff --git a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md index f41bceab12d..e6a50443837 100644 --- a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md +++ b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md @@ -16,11 +16,11 @@ New dbt Cloud accounts will automatically be created with a Development Environm To create a development environment, choose **Deploy** > **Environments** from the top left. Then, click **Create Environment**. - + Enter an environment **Name** that would help you identify it among your other environments (for example, `Nate's Development Environment`). Choose **Development** as the **Environment Type**. You can also select which **dbt Version** to use at this time. For compatibility reasons, we recommend that you select the same dbt version that you plan to use in your deployment environment. Finally, click **Save** to finish creating your development environment. - + ### Setting up developer credentials @@ -28,14 +28,14 @@ The IDE uses *developer credentials* to connect to your database. These develope New dbt Cloud accounts should have developer credentials created automatically as a part of Project creation in the initial application setup. - + New users on existing accounts *might not* have their development credentials already configured. To manage your development credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 2. Select the relevant project in the list. After entering your developer credentials, you'll be able to access the dbt IDE. - + ### Compiling and running SQL diff --git a/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md b/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md index 459fcfc487f..2966aebae64 100644 --- a/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md +++ b/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md @@ -17,7 +17,7 @@ import DeprecationNotice from '/snippets/_sl-deprecation-notice.md'; The dbt Semantic Layer allows you to define metrics and use various interfaces to query them. The Semantic Layer does the heavy lifting to find where the queried data exists in your data platform and generates the SQL to make the request (including performing joins). - + ## Components diff --git a/website/docs/faqs/API/rotate-token.md b/website/docs/faqs/API/rotate-token.md index 144c834ea8a..dd54fb271f3 100644 --- a/website/docs/faqs/API/rotate-token.md +++ b/website/docs/faqs/API/rotate-token.md @@ -19,7 +19,7 @@ To automatically rotate your API key: 2. Select **API Access** from the lefthand side. 3. In the **API** pane, click `Rotate`. - + diff --git a/website/docs/faqs/Accounts/change-users-license.md b/website/docs/faqs/Accounts/change-users-license.md index 8755b946126..ed12ba5dc14 100644 --- a/website/docs/faqs/Accounts/change-users-license.md +++ b/website/docs/faqs/Accounts/change-users-license.md @@ -10,10 +10,10 @@ To change the license type for a user from `developer` to `read-only` or `IT` in 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove, and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. - + diff --git a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md index f8daf393f9b..de4698879e3 100644 --- a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md +++ b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md @@ -30,7 +30,7 @@ To unlock your account and select a plan, review the following guidance per plan 3. Confirm your plan selection on the pop up message. 4. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Developer plan. 🎉 - + ### Team plan @@ -40,7 +40,7 @@ To unlock your account and select a plan, review the following guidance per plan 4. Enter your payment details and click **Save**. 5. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Team plan. 🎉 - + ### Enterprise plan @@ -48,7 +48,7 @@ To unlock your account and select a plan, review the following guidance per plan 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Cloud Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. - + 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt Cloud can help you and your team. diff --git a/website/docs/faqs/Accounts/delete-users.md b/website/docs/faqs/Accounts/delete-users.md index a7e422fd82c..6041eb93d9d 100644 --- a/website/docs/faqs/Accounts/delete-users.md +++ b/website/docs/faqs/Accounts/delete-users.md @@ -10,20 +10,20 @@ To delete a user in dbt Cloud, you must be an account owner or have admin privil 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. - + If you are on a **Teams** plan and you are deleting users to reduce the number of billable seats, you also need to take these steps to lower the license count: 1. In **Account Settings**, select **Billing**. 2. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. - + ## Related docs diff --git a/website/docs/faqs/Environments/custom-branch-settings.md b/website/docs/faqs/Environments/custom-branch-settings.md index 4bc4b85be02..6ba2a719ee8 100644 --- a/website/docs/faqs/Environments/custom-branch-settings.md +++ b/website/docs/faqs/Environments/custom-branch-settings.md @@ -28,7 +28,7 @@ For example, if you want to use the `develop` branch of a connected repository: - Enter **develop** as the name of your custom branch - Click **Save** - + ## Deployment diff --git a/website/docs/faqs/Git/git-migration.md b/website/docs/faqs/Git/git-migration.md index 775ae3679e3..454dd356285 100644 --- a/website/docs/faqs/Git/git-migration.md +++ b/website/docs/faqs/Git/git-migration.md @@ -16,7 +16,7 @@ To migrate from one git provider to another, refer to the following steps to avo 2. Go back to dbt Cloud and set up your [integration for the new git provider](/docs/cloud/git/connect-github), if needed. 3. Disconnect the old repository in dbt Cloud by going to **Account Settings** and then **Projects**. Click on the **Repository** link, then click **Edit** and **Disconnect**. - + 4. On the same page, connect to the new git provider repository by clicking **Configure Repository** - If you're using the native integration, you may need to OAuth to it. diff --git a/website/docs/faqs/Git/gitignore.md b/website/docs/faqs/Git/gitignore.md index 6bda9611733..cda3a9d75b9 100644 --- a/website/docs/faqs/Git/gitignore.md +++ b/website/docs/faqs/Git/gitignore.md @@ -35,7 +35,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 4. Save the changes but _don't commit_. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. - + 6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` @@ -50,7 +50,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. - + ### Fix in the git provider @@ -144,7 +144,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 4. Save the changes but _don't commit_. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. - + 6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` @@ -159,7 +159,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. - + ### Fix in the git provider diff --git a/website/docs/faqs/Project/delete-a-project.md b/website/docs/faqs/Project/delete-a-project.md index 5fde3fee9cd..21f16cbfaec 100644 --- a/website/docs/faqs/Project/delete-a-project.md +++ b/website/docs/faqs/Project/delete-a-project.md @@ -9,10 +9,10 @@ To delete a project in dbt Cloud, you must be the account owner or have admin pr 1. From dbt Cloud, click the gear icon at the top right corner and select **Account Settings**. - + 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option will appear on the left side of the same details view. 4. Select **Delete**. Confirm the action to immediately delete the user without additional password prompts. There will be no account password prompt, and the project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. - + diff --git a/website/docs/faqs/Troubleshooting/gitignore.md b/website/docs/faqs/Troubleshooting/gitignore.md index 59fd4e8c866..2b668a3efb9 100644 --- a/website/docs/faqs/Troubleshooting/gitignore.md +++ b/website/docs/faqs/Troubleshooting/gitignore.md @@ -24,7 +24,7 @@ dbt_modules/ 2. Save your changes but _don't commit_ 3. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. - + 4. Select **Restart IDE**. 5. Go back to your dbt project and delete the following files or folders if you have them: @@ -35,7 +35,7 @@ dbt_modules/ 9. Merge the PR on your git provider page. 10. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. - + @@ -53,12 +53,12 @@ dbt_modules/ 2. Go to your `dbt_project.yml` file and add `tmp/` after your `target-path:` and add `log-path: "tmp/logs"`. * So it should look like: `target-path: "tmp/target"` and `log-path: "tmp/logs"`: - + 3. Save your changes but _don't commit_. 4. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. - + 5. Select **Restart IDE**. 6. Go back to your dbt project and delete the following four folders (if you have them): @@ -71,7 +71,7 @@ dbt_modules/ * Remove `tmp` from your `target-path` and completely remove the `log-path: "tmp/logs"` line. - + 9. Restart the IDE again. 10. Delete the `tmp` folder in the **File Explorer**. @@ -79,7 +79,7 @@ dbt_modules/ 12. Merge the PR in your git provider page. 13. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. - + diff --git a/website/docs/guides/adapter-creation.md b/website/docs/guides/adapter-creation.md index 8bf082b04a0..a5caa59bacf 100644 --- a/website/docs/guides/adapter-creation.md +++ b/website/docs/guides/adapter-creation.md @@ -107,7 +107,7 @@ A set of *materializations* and their corresponding helper macros defined in dbt Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works. - + ## Prerequisites @@ -1231,17 +1231,17 @@ This can vary substantially depending on the nature of the release but a good ba Breaking this down: - Visually distinctive announcement - make it clear this is a release - + - Short written description of what is in the release - + - Links to additional resources - + - Implementation instructions: - + - Future plans - + - Contributor recognition (if applicable) - + ## Verify a new adapter diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index 9cf2447fa52..3e441efd675 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -57,7 +57,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen Click **Run**, then check for results from the queries. For example:
- +
2. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: - **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. @@ -65,7 +65,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. - **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default).
- +
3. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. diff --git a/website/docs/guides/codespace-qs.md b/website/docs/guides/codespace-qs.md index b28b0ddaacf..c399eb494a9 100644 --- a/website/docs/guides/codespace-qs.md +++ b/website/docs/guides/codespace-qs.md @@ -35,7 +35,7 @@ dbt Labs provides a [GitHub Codespace](https://docs.github.com/en/codespaces/ove 1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: - + When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index 1778098f752..6c1d60c93da 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -144,7 +144,7 @@ In Azure: - Click *OK* and then *Save* to save the variable - Save your new Azure pipeline - + @@ -486,9 +486,9 @@ Additionally, you’ll see the job in the run history of dbt Cloud. It should be - + - + diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index 5a0c5536e7f..2237582f990 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -42,7 +42,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 1. Use your existing account or sign up for a Databricks account at [Try Databricks](https://databricks.com/). Complete the form with your user information.
- +
2. For the purpose of this tutorial, you will be selecting AWS as our cloud provider but if you use Azure or GCP internally, please choose one of them. The setup process will be similar. @@ -50,28 +50,28 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 4. After setting up your password, you will be guided to choose a subscription plan. Select the `Premium` or `Enterprise` plan to access the SQL Compute functionality required for using the SQL warehouse for dbt. We have chosen `Premium` for this tutorial. Click **Continue** after selecting your plan.
- +
5. Click **Get Started** when you come to this below page and then **Confirm** after you validate that you have everything needed.
- +
- +
6. Now it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, etc into one place. Provide the name of your workspace and choose the appropriate AWS region and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for the purpose of this tutorial.
- +
7. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this.
- +
:::tip @@ -80,16 +80,16 @@ If you get a session error and don’t get redirected to this page, you can go b 8. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**.
- +
- +
10. Go back to the Databricks tab. You should see that your workspace is ready to use.
- +
11. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. @@ -102,7 +102,7 @@ If you get a session error and don’t get redirected to this page, you can go b 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space.
- +
3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. @@ -110,12 +110,12 @@ If you get a session error and don’t get redirected to this page, you can go b 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu.
- +
6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI.
- +
7. Update the Table Attributes at the top: @@ -129,7 +129,7 @@ If you get a session error and don’t get redirected to this page, you can go b - LAST_NAME = string
- +
8. Click **Create** on the bottom once you’re done. @@ -137,11 +137,11 @@ If you get a session error and don’t get redirected to this page, you can go b 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`.
- +
- +
10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. @@ -154,7 +154,7 @@ If you get a session error and don’t get redirected to this page, you can go b ```
- +
12. To ensure any users who might be working on your dbt project has access to your object, run this command. diff --git a/website/docs/guides/dbt-python-snowpark.md b/website/docs/guides/dbt-python-snowpark.md index 110445344e9..fce0ad692f6 100644 --- a/website/docs/guides/dbt-python-snowpark.md +++ b/website/docs/guides/dbt-python-snowpark.md @@ -51,19 +51,19 @@ Overall we are going to set up the environments, build scalable pipelines in dbt 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. - + 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: - + 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. - + - + 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. @@ -80,7 +80,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE_WH** warehouse. - + 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! @@ -233,7 +233,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. - + 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: - Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. @@ -244,7 +244,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. - + 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. @@ -256,7 +256,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt Cloud! - + ## Configure dbt Cloud @@ -264,19 +264,19 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Admin** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. - + 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. - + 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt Cloud trial. - + 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt Cloud registration page. Fill out the form. Make sure to save the password somewhere for login in the future. - + 9. Select **Complete Registration**. You should now be redirected to your dbt Cloud account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. @@ -286,43 +286,43 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, select **Profile Settings** from the gear icon in the upper right. - + 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. - + 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name (`hwatson` is used in the lab screenshots). Be sure to click **Save** for your changes! - + 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt Cloud’s Integrated Development Environment (IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. - + 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and push** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. - + - + 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt Cloud allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. - + 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. - + 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. - + 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC_DBT_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. - + ## Create branch and set up project configs @@ -414,15 +414,15 @@ dbt Labs has developed a [project structure guide](/best-practices/how-we-struct 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. - - + + - If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: - + Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. @@ -742,21 +742,21 @@ The next step is to set up the staging models for each of the 8 source tables. G After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: - + 1. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt Cloud will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. - + 1. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! - + Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! - + Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. @@ -1055,7 +1055,7 @@ By now, we are pretty good at creating new files in the correct directories so w 1. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! - + 1. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. @@ -1128,7 +1128,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? - The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. - + The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. @@ -1142,7 +1142,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? ``` and preview the output: - + Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. @@ -1187,7 +1187,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. - + We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. @@ -1314,7 +1314,7 @@ At a high level we’ll be: - The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! - The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: - + ### Covariate encoding @@ -1565,7 +1565,7 @@ If you haven’t seen code like this before or use joblib files to save machine - Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). - Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: - + 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: @@ -1573,10 +1573,10 @@ If you haven’t seen code like this before or use joblib files to save machine list @modelstage ``` - + 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. - + ### Predicting on new data @@ -1731,7 +1731,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Generic tests 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. - + ```yaml version: 2 @@ -1762,7 +1762,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Using macros for testing 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. - + ```sql {% macro test_all_values_gte_zero(table, column) %} @@ -1776,7 +1776,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. - + 5. Copy the following code into the file and save: @@ -1805,7 +1805,7 @@ These tests are defined in `.sql` files, typically in your `tests` directory (as Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. - + 2. Copy the following code and save the file: @@ -1841,11 +1841,11 @@ Let’s add a custom test that asserts that the moving average of the lap time o dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` - + 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. - + ## Document your dbt project @@ -1865,17 +1865,17 @@ To start, let’s look back at our `intermediate.md` file. We can see that we pr ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. - + 2. Go to our project area and view `int_results`. View the description that we created in our doc block. - + 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). - + 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. Starting in dbt v1.3, we can see how our lineage in our docs looks. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot. - + ## Deploy your code @@ -1890,18 +1890,18 @@ Now that we've completed testing and documenting our work, we're ready to deploy 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge to main**. Select **Merge to main** and the merge process will automatically run in the background. - + 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the menu, select **Deploy** **> Environments** - + 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt Cloud is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. - + 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. @@ -1912,11 +1912,11 @@ Now that we've completed testing and documenting our work, we're ready to deploy So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. - - + + 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. - + ### Conclusion diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md index 378ec857f6a..b5b020dd768 100644 --- a/website/docs/guides/dremio-lakehouse.md +++ b/website/docs/guides/dremio-lakehouse.md @@ -143,7 +143,7 @@ dremioSamples: Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: - + ## About the schema.yml @@ -156,7 +156,7 @@ The models correspond to both weather and trip data respectively and will be joi The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. - + ## About the models @@ -170,11 +170,11 @@ The sources can be found by navigating to the **Object Storage** section of the When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. - + Open the **Application folder** and you will see the output of the simple transformation we did using dbt. - + ## Query the data @@ -191,6 +191,6 @@ GROUP BY vendor_id ``` - + This completes the integration setup and data is ready for business consumption. diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index fcd1e5e9599..53cf154d09e 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -67,7 +67,7 @@ $ pwd 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command.
- +
6. dbt provides the following values in the `dbt_project.yml` file: @@ -126,7 +126,7 @@ $ dbt debug ```
- +
### FAQs @@ -150,7 +150,7 @@ dbt run You should have an output that looks like this:
- +
## Commit your changes @@ -197,7 +197,7 @@ $ git checkout -b add-customers-model 4. From the command line, enter `dbt run`.
- +
When you return to the BigQuery console, you can `select` from this model. @@ -463,6 +463,6 @@ We recommend using dbt Cloud as the easiest and most reliable way to [deploy job For more info on how to get started, refer to [create and schedule jobs](/docs/deploy/deploy-jobs#create-and-schedule-jobs). - + For more information about using dbt Core to schedule a job, refer [dbt airflow](/blog/dbt-airflow-spiritual-alignment) blog post. diff --git a/website/docs/guides/microsoft-fabric-qs.md b/website/docs/guides/microsoft-fabric-qs.md index 1d1e016a6f1..2d2dd738c42 100644 --- a/website/docs/guides/microsoft-fabric-qs.md +++ b/website/docs/guides/microsoft-fabric-qs.md @@ -41,7 +41,7 @@ A public preview of Microsoft Fabric in dbt Cloud is now available! 1. Log in to your [Microsoft Fabric](http://app.fabric.microsoft.com) account. 2. On the home page, select the **Synapse Data Warehouse** tile. - + 3. From **Workspaces** on the left sidebar, navigate to your organization’s workspace. Or, you can create a new workspace; refer to [Create a workspace](https://learn.microsoft.com/en-us/fabric/get-started/create-workspaces) in the Microsoft docs for more details. 4. Choose your warehouse from the table. Or, you can create a new warehouse; refer to [Create a warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/tutorial-create-warehouse) in the Microsoft docs for more details. @@ -100,7 +100,7 @@ A public preview of Microsoft Fabric in dbt Cloud is now available! ); ``` - + ## Connect dbt Cloud to Microsoft Fabric diff --git a/website/docs/guides/productionize-your-dbt-databricks-project.md b/website/docs/guides/productionize-your-dbt-databricks-project.md index 3584cffba77..456c69dcb87 100644 --- a/website/docs/guides/productionize-your-dbt-databricks-project.md +++ b/website/docs/guides/productionize-your-dbt-databricks-project.md @@ -105,7 +105,7 @@ The [run history](/docs/deploy/run-visibility#run-history) dashboard in dbt Clou The deployment monitor in dbt Cloud offers a higher-level view of your run history, enabling you to gauge the health of your data pipeline over an extended period of time. This feature includes information on run durations and success rates, allowing you to identify trends in job performance, such as increasing run times or more frequent failures. The deployment monitor also highlights jobs in progress, queued, and recent failures. To access the deployment monitor click on the dbt logo in the top left corner of the dbt Cloud UI. - + By adding [status tiles](/docs/deploy/dashboard-status-tiles) to your BI dashboards, you can give stakeholders visibility into the health of your data pipeline without leaving their preferred interface. Status tiles instill confidence in your data and help prevent unnecessary inquiries or context switching. To implement dashboard status tiles, you'll need to have dbt docs with [exposures](/docs/build/exposures) defined. diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 890be27e50a..26fba0c50ff 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -45,17 +45,17 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE_IN_PROGRESS status. - + 4. When the stack status changes to CREATE_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. - + 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. - + 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. @@ -65,9 +65,9 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **User name** — `dbtadmin` - **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. - + - + 9. Click **Create connection**. @@ -82,15 +82,15 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. - + 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. - + 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. @@ -173,7 +173,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Port** — `5439` - **Database** — `dbtworkshop`.
- +
5. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: @@ -181,7 +181,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Password** — This is the autogenerated password that you used earlier in the guide - **Schema** — dbt Cloud automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
- +
6. Click **Test Connection**. This verifies that dbt Cloud can access your Redshift cluster. diff --git a/website/docs/guides/refactoring-legacy-sql.md b/website/docs/guides/refactoring-legacy-sql.md index a339e523020..b12baac95cd 100644 --- a/website/docs/guides/refactoring-legacy-sql.md +++ b/website/docs/guides/refactoring-legacy-sql.md @@ -44,7 +44,7 @@ While refactoring you'll be **moving around** a lot of logic, but ideally you wo To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. - + Once you've copied it over, you'll want to `dbt run` to execute the query and populate the in your warehouse. @@ -76,7 +76,7 @@ If you're migrating multiple stored procedures into dbt, with sources you can se This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. - + #### Build the habit of analytics-as-code Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index 89d7c5a14fa..aa4811d9339 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -22,7 +22,7 @@ After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: - + ### Prerequisites @@ -309,7 +309,7 @@ The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing t In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: - + ### Advanced prerequisites @@ -323,7 +323,7 @@ As noted above, this branch will outlive any individual feature, and will be the See [Custom branch behavior](/docs/dbt-cloud-environments#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. - + ### 3. Create a new QA environment diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 5b4f9e3e2be..d90406001a2 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -143,35 +143,35 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 1. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Admin**. Then, select **Partner Connect**. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt. - + If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. - + 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. - + - + 3. Click **Activate** when a popup appears: - + - + 4. After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. - + 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt Cloud automatically. 6. From your **Account Settings** in dbt Cloud (using the gear menu in the upper right corner), choose the "Partner Connect Trial" project and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. - + - +
@@ -181,7 +181,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 2. Enter a project name and click **Continue**. 3. For the warehouse, click **Snowflake** then **Next** to set up your connection. - + 4. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs. @@ -192,7 +192,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. - + 5. Enter your **Development Credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. @@ -201,7 +201,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently. - + 6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account. 7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials. diff --git a/website/docs/guides/starburst-galaxy-qs.md b/website/docs/guides/starburst-galaxy-qs.md index 1822c83fa90..9a6c44574cd 100644 --- a/website/docs/guides/starburst-galaxy-qs.md +++ b/website/docs/guides/starburst-galaxy-qs.md @@ -92,11 +92,11 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: - + 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. - + 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. @@ -113,7 +113,7 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To When done, click **Add privileges**. - + ## Create tables with Starburst Galaxy To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). @@ -121,7 +121,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: - + 3. Copy and paste these queries into the query editor. Then **Run** each query individually. @@ -181,7 +181,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: - + 5. Verify that the tables were created successfully. In the query editor, run the following queries: diff --git a/website/docs/reference/commands/clone.md b/website/docs/reference/commands/clone.md index 6bdc2c02e07..651d0c36908 100644 --- a/website/docs/reference/commands/clone.md +++ b/website/docs/reference/commands/clone.md @@ -49,7 +49,7 @@ You can clone nodes between states in dbt Cloud using the `dbt clone` command. T - Set up your **Production environment** and have a successful job run. - Enable **Defer to production** by toggling the switch in the lower-right corner of the command bar. - + - Run the `dbt clone` command from the command bar. diff --git a/website/docs/reference/node-selection/graph-operators.md b/website/docs/reference/node-selection/graph-operators.md index 8cba43e1b52..88d99d7b92a 100644 --- a/website/docs/reference/node-selection/graph-operators.md +++ b/website/docs/reference/node-selection/graph-operators.md @@ -29,7 +29,7 @@ dbt run --select "3+my_model+4" # select my_model, its parents up to the ### The "at" operator The `@` operator is similar to `+`, but will also include _the parents of the children of the selected model_. This is useful in continuous integration environments where you want to build a model and all of its children, but the _parents_ of those children might not exist in the database yet. The selector `@snowplow_web_page_context` will build all three models shown in the diagram below. - + ```bash dbt run --models @my_model # select my_model, its children, and the parents of its children diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 8f323bc4236..a5198fd3487 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -379,7 +379,7 @@ models: - + ### Specifying tags BigQuery table and view *tags* can be created by supplying an empty string for the label value. diff --git a/website/docs/reference/resource-configs/persist_docs.md b/website/docs/reference/resource-configs/persist_docs.md index 15b1e0bdb40..481f25d4e95 100644 --- a/website/docs/reference/resource-configs/persist_docs.md +++ b/website/docs/reference/resource-configs/persist_docs.md @@ -186,8 +186,8 @@ models: Run dbt and observe that the created relation and columns are annotated with your descriptions: - - diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index ce3b317f0f1..5c32fa5fc83 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -104,7 +104,7 @@ If no `partition_by` is specified, then the `insert_overwrite` strategy will ato - This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). - If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). - + + If mixing images and text together, also consider using a docs block. diff --git a/website/docs/terms/dag.md b/website/docs/terms/dag.md index c6b91300bfc..b108c68806a 100644 --- a/website/docs/terms/dag.md +++ b/website/docs/terms/dag.md @@ -32,7 +32,7 @@ One of the great things about DAGs is that they are *visual*. You can clearly id Take this mini-DAG for an example: - + What can you learn from this DAG? Immediately, you may notice a handful of things: @@ -57,7 +57,7 @@ You can additionally use your DAG to help identify bottlenecks, long-running dat ...to name just a few. Understanding the factors impacting model performance can help you decide on [refactoring approaches](https://courses.getdbt.com/courses/refactoring-sql-for-modularity), [changing model materialization](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model#attempt-2-moving-to-an-incremental-model)s, replacing multiple joins with surrogate keys, or other methods. - + ### Modular data modeling best practices @@ -83,7 +83,7 @@ The marketing team at dbt Labs would be upset with us if we told you we think db Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project. - + The DAG is also [available in the dbt Cloud IDE](https://www.getdbt.com/blog/on-dags-hierarchies-and-ides/), so you and your team can refer to your lineage while you build your models. diff --git a/website/docs/terms/data-lineage.md b/website/docs/terms/data-lineage.md index d0162c35616..1dbda6e6b67 100644 --- a/website/docs/terms/data-lineage.md +++ b/website/docs/terms/data-lineage.md @@ -63,13 +63,13 @@ In the greater data world, you may often hear of data lineage systems based on t If you use a transformation tool such as dbt that automatically infers relationships between data sources and models, a DAG automatically populates to show you the lineage that exists for your [data transformations](https://www.getdbt.com/analytics-engineering/transformation/). - + Your is used to visually show upstream dependencies, the nodes that must come before a current model, and downstream relationships, the work that is impacted by the current model. DAGs are also directional—they show a defined flow of movement and form non-cyclical loops. Ultimately, DAGs are an effective way to see relationships between data sources, models, and dashboards. DAGs are also a great way to see visual bottlenecks, or inefficiencies in your data work (see image below for a DAG with...many bottlenecks). Data teams can additionally add [meta fields](https://docs.getdbt.com/reference/resource-configs/meta) and documentation to nodes in the DAG to add an additional layer of governance to their dbt project. - + :::tip Automatic > Manual diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md index 4a882589f77..349a57731bd 100644 --- a/website/snippets/_cloud-environments-info.md +++ b/website/snippets/_cloud-environments-info.md @@ -56,7 +56,7 @@ Extended Attributes is a text box extension at the environment level that overri Something to note, Extended Attributes doesn't mask secret values. We recommend avoiding setting secret values to prevent visibility in the text box and logs. -
+
If you're developing in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), [dbt Cloud CLI](/docs/cloud/cloud-cli-installation), or [orchestrating job runs](/docs/deploy/deployments), Extended Attributes parses through the provided YAML and extracts the `profiles.yml` attributes. For each individual attribute: @@ -91,7 +91,7 @@ dbt Cloud will use the cached copy of your project's Git repo under these circum To enable Git repository caching, select **Account settings** from the gear menu and enable the **Repository caching** option. - + :::note @@ -109,7 +109,7 @@ Partial parsing in dbt Cloud requires dbt version 1.4 or newer. The feature does To enable, select **Account settings** from the gear menu and enable the **Partial parsing** option. - + diff --git a/website/snippets/_new-sl-setup.md b/website/snippets/_new-sl-setup.md index a02481db33d..5c048824ac3 100644 --- a/website/snippets/_new-sl-setup.md +++ b/website/snippets/_new-sl-setup.md @@ -18,17 +18,17 @@ If you've configured the legacy Semantic Layer, it has been deprecated, and dbt 3. In the **Project Details** page, navigate to the **Semantic Layer** section, and select **Configure Semantic Layer**. - + 4. In the **Set Up Semantic Layer Configuration** page, enter the credentials you want the Semantic Layer to use specific to your data platform. We recommend credentials have the least privileges required because your Semantic Layer users will be querying it in downstream applications. At a minimum, the Semantic Layer needs to have read access to the schema(s) that contains the dbt models that you used to build your semantic models. - + 5. Select the deployment environment you want for the Semantic Layer and click **Save**. 6. After saving it, you'll be provided with the connection information that allows you to connect to downstream tools. If your tool supports JDBC, save the JDBC URL or individual components (like environment id and host). If it uses the GraphQL API, save the GraphQL API host information instead. - + 7. Save and copy your environment ID, service token, and host, which you'll need to use downstream tools. For more info on how to integrate with partner integrations, refer to [Available integrations](/docs/use-dbt-semantic-layer/avail-sl-integrations). diff --git a/website/snippets/_sl-run-prod-job.md b/website/snippets/_sl-run-prod-job.md index a637b0b431e..b666cfa8e61 100644 --- a/website/snippets/_sl-run-prod-job.md +++ b/website/snippets/_sl-run-prod-job.md @@ -4,4 +4,4 @@ Once you’ve defined metrics in your dbt project, you can perform a job run in 2. Select **Jobs** to rerun the job with the most recent code in the deployment environment. 3. Your metric should appear as a red node in the dbt Cloud IDE and dbt directed acyclic graphs (DAG). - + diff --git a/website/snippets/quickstarts/intro-build-models-atop-other-models.md b/website/snippets/quickstarts/intro-build-models-atop-other-models.md index 1104461079b..eeedec34892 100644 --- a/website/snippets/quickstarts/intro-build-models-atop-other-models.md +++ b/website/snippets/quickstarts/intro-build-models-atop-other-models.md @@ -2,4 +2,4 @@ As a best practice in SQL, you should separate logic that cleans up your data fr Now you can experiment by separating the logic out into separate models and using the [ref](/reference/dbt-jinja-functions/ref) function to build models on top of other models: - + diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index c2951057b57..a3d211ea237 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -1,43 +1,32 @@ -import React, { useState, useRef, useEffect } from 'react'; +import React, { useState, useEffect } from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; function Lightbox({ - src, + src, collapsed = false, - alignment = "center", - alt = undefined, - title = undefined, + alignment = "center", + alt = undefined, + title = undefined, width = undefined, }) { const [isHovered, setIsHovered] = useState(false); - const [hoverStyle, setHoverStyle] = useState({}); - const imgRef = useRef(null); + const [expandImage, setExpandImage] = useState(false); useEffect(() => { - if (imgRef.current && !width) { - const naturalWidth = imgRef.current.naturalWidth; - const naturalHeight = imgRef.current.naturalHeight; + let timeoutId; - // Calculate the expanded size for images without a specified width - const expandedWidth = naturalWidth * 1.2; // Example: 20% increase - const expandedHeight = naturalHeight * 1.2; - - setHoverStyle({ - width: `${expandedWidth}px`, - height: `${expandedHeight}px`, - transition: 'width 0.5s ease, height 0.5s ease', - }); + if (isHovered) { + // Delay the expansion by 5 milliseconds + timeoutId = setTimeout(() => { + setExpandImage(true); + }, 5); } - }, [width]); - // Set alignment class if alignment prop used - let imageAlignment = ''; - if(alignment === "left") { - imageAlignment = styles.leftAlignLightbox; - } else if(alignment === "right") { - imageAlignment = styles.rightAlignLightbox; - } + return () => { + clearTimeout(timeoutId); + }; + }, [isHovered]); const handleMouseEnter = () => { setIsHovered(true); @@ -45,39 +34,39 @@ function Lightbox({ const handleMouseLeave = () => { setIsHovered(false); - setHoverStyle({}); + setExpandImage(false); }; return ( <> - {alt {title && ( - {title} + { title } )} - +
); } diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index b101178825c..eb280b2feb7 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -1,3 +1,4 @@ + :local(.title) { text-align: center; font-size: small; @@ -25,9 +26,8 @@ margin: 10px 0 10px auto; } -.hovered { - transform: scale(1.3); - transition: transform 0.5s ease; +:local(.hovered) { /* Add the . before the class name */ + filter: drop-shadow(4px 4px 6px #aaaaaae1); + transition: transform 0.3s ease; + z-index: 9999; } - - From 5992553c519f7e31e072be43e2489db0525399b0 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 11 Jan 2024 17:21:54 +0000 Subject: [PATCH 28/56] Update website/blog/2023-04-17-dbt-squared.md --- website/blog/2023-04-17-dbt-squared.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/blog/2023-04-17-dbt-squared.md b/website/blog/2023-04-17-dbt-squared.md index 4050e52c690..95288384ecc 100644 --- a/website/blog/2023-04-17-dbt-squared.md +++ b/website/blog/2023-04-17-dbt-squared.md @@ -50,7 +50,7 @@ We needed a way to make this architecture manageable when dozens of downstream t The second architectural decision was whether or not to create a single dbt project for all 50+ country teams, or to follow a multi-project approach in which each country would have its own separate dbt project in the shared repo. It was critical that each country team was able to move at different paces and have full control over their domains. This would avoid issues like model name collisions across countries and remove dependencies that would risk cascading errors between countries. Therefore, we opted for a one project per country approach. - + The resulting data flow from core to country teams now follows this pattern. The *Sources* database holds all of the raw data in the Redshift cluster and the *Integrated* database contains the curated and ready-for-consumption outputs from the core dbt project. These outputs are termed Source Data Products (SDPs). These SDPs are then leveraged by the core team to build Global Data Products—products tailored to answering business questions for global stakeholders. They are also filtered at the country-level and used as sources to the country-specific Data Products managed by the country teams. These, in turn, are hosted in the respective `affiliate_db_` database. Segregating at the database-level facilitates data governance and privacy management. From b6de35ca78416b78a04a481eba9d99483115097e Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 11 Jan 2024 17:32:20 +0000 Subject: [PATCH 29/56] Revert "Merge branch 'current' into mwong-hover-images" This reverts commit da4080554c30ed22ca0e45931d2cf4a8ca66f45b, reversing changes made to 39ad19ddf6b660f848b4bd9a790c9a13be021edc. --- contributing/content-style-guide.md | 2 +- ...022-11-22-move-spreadsheets-to-your-dwh.md | 10 +- .../blog/2022-11-30-dbt-project-evaluator.md | 4 +- .../blog/2023-01-17-grouping-data-tests.md | 4 +- ...01-ingestion-time-partitioning-bigquery.md | 2 +- website/blog/2023-03-23-audit-helper.md | 16 +-- website/blog/2023-04-17-dbt-squared.md | 4 +- ...ng-a-kimball-dimensional-model-with-dbt.md | 22 ++-- ...23-04-24-framework-refactor-alteryx-dbt.md | 10 +- ...odeling-ragged-time-varying-hierarchies.md | 2 +- .../2023-05-04-generating-dynamic-docs.md | 10 +- ...orical-user-segmentation-model-with-dbt.md | 6 +- ...023-07-03-data-vault-2-0-with-dbt-cloud.md | 4 +- website/blog/2023-07-17-GPT-and-dbt-test.md | 6 +- ...023-08-01-announcing-materialized-views.md | 2 +- .../2023-11-14-specify-prod-environment.md | 2 +- ...-12-11-semantic-layer-on-semantic-layer.md | 8 +- .../blog/2024-01-09-defer-in-development.md | 14 +-- .../clone-incremental-models.md | 6 +- .../dbt-unity-catalog-best-practices.md | 4 +- website/docs/docs/build/about-metricflow.md | 2 +- .../docs/docs/build/custom-target-names.md | 4 +- website/docs/docs/build/data-tests.md | 2 +- .../docs/docs/build/environment-variables.md | 20 ++-- website/docs/docs/build/exposures.md | 4 +- website/docs/docs/build/python-models.md | 4 +- website/docs/docs/build/semantic-models.md | 2 +- website/docs/docs/build/sources.md | 4 +- website/docs/docs/build/sql-models.md | 2 +- .../docs/cloud/about-cloud-develop-defer.md | 2 +- .../about-connections.md | 2 +- .../connect-apache-spark.md | 2 +- .../connect-databricks.md | 2 +- .../connect-redshift-postgresql-alloydb.md | 4 +- .../connect-snowflake.md | 6 +- .../connnect-bigquery.md | 4 +- .../dbt-cloud-ide/develop-in-the-cloud.md | 8 +- .../cloud/dbt-cloud-ide/ide-user-interface.md | 48 ++++---- .../docs/cloud/dbt-cloud-ide/lint-format.md | 20 ++-- .../docs/docs/cloud/git/authenticate-azure.md | 4 +- website/docs/docs/cloud/git/connect-github.md | 8 +- website/docs/docs/cloud/git/connect-gitlab.md | 14 +-- .../cloud/git/import-a-project-by-git-url.md | 12 +- website/docs/docs/cloud/git/setup-azure.md | 14 +-- .../docs/cloud/manage-access/audit-log.md | 6 +- .../cloud/manage-access/auth0-migration.md | 26 ++--- .../manage-access/cloud-seats-and-users.md | 16 +-- .../manage-access/enterprise-permissions.md | 4 +- .../docs/cloud/manage-access/invite-users.md | 12 +- .../manage-access/set-up-bigquery-oauth.md | 10 +- .../manage-access/set-up-databricks-oauth.md | 4 +- .../manage-access/set-up-snowflake-oauth.md | 4 +- .../set-up-sso-google-workspace.md | 10 +- .../manage-access/set-up-sso-saml-2.0.md | 66 +++++++---- .../docs/cloud/manage-access/sso-overview.md | 2 +- .../docs/docs/cloud/secure/ip-restrictions.md | 4 +- .../docs/cloud/secure/postgres-privatelink.md | 4 +- .../docs/cloud/secure/redshift-privatelink.md | 14 +-- .../cloud/secure/snowflake-privatelink.md | 2 +- .../docs/docs/cloud/secure/vcs-privatelink.md | 10 +- .../cloud-build-and-view-your-docs.md | 8 +- .../docs/docs/collaborate/documentation.md | 8 +- .../collaborate/explore-multiple-projects.md | 4 +- .../collaborate/git/managed-repository.md | 2 +- .../docs/collaborate/git/merge-conflicts.md | 10 +- .../docs/docs/collaborate/git/pr-template.md | 2 +- .../docs/collaborate/model-performance.md | 6 +- .../collaborate/project-recommendations.md | 4 +- .../docs/docs/dbt-cloud-apis/discovery-api.md | 8 +- .../docs/dbt-cloud-apis/discovery-querying.md | 4 +- .../discovery-use-cases-and-examples.md | 10 +- .../docs/dbt-cloud-apis/service-tokens.md | 2 +- .../docs/docs/dbt-cloud-apis/user-tokens.md | 2 +- website/docs/docs/dbt-cloud-environments.md | 2 +- .../73-Jan-2024/partial-parsing.md | 2 +- .../74-Dec-2023/external-attributes.md | 2 +- .../release-notes/75-Nov-2023/repo-caching.md | 2 +- .../76-Oct-2023/native-retry-support-rn.md | 2 +- .../release-notes/76-Oct-2023/sl-ga.md | 2 +- .../77-Sept-2023/ci-updates-phase2-rn.md | 4 +- .../removing-prerelease-versions.md | 2 +- .../release-notes/79-July-2023/faster-run.md | 4 +- .../80-June-2023/lint-format-rn.md | 6 +- .../run-details-and-logs-improvements.md | 2 +- .../81-May-2023/run-history-endpoint.md | 2 +- .../81-May-2023/run-history-improvements.md | 2 +- .../86-Dec-2022/new-jobs-default-as-off.md | 2 +- .../92-July-2022/render-lineage-feature.md | 2 +- .../95-March-2022/ide-timeout-message.md | 2 +- .../95-March-2022/prep-and-waiting-time.md | 2 +- .../dbt-versions/upgrade-core-in-cloud.md | 6 +- website/docs/docs/deploy/artifacts.md | 8 +- website/docs/docs/deploy/ci-jobs.md | 14 +-- .../docs/deploy/continuous-integration.md | 6 +- .../docs/deploy/dashboard-status-tiles.md | 12 +- .../docs/docs/deploy/deploy-environments.md | 20 ++-- website/docs/docs/deploy/deploy-jobs.md | 6 +- .../docs/docs/deploy/deployment-overview.md | 6 +- website/docs/docs/deploy/deployment-tools.md | 6 +- website/docs/docs/deploy/job-commands.md | 2 +- website/docs/docs/deploy/job-notifications.md | 6 +- website/docs/docs/deploy/job-scheduler.md | 4 +- website/docs/docs/deploy/monitor-jobs.md | 6 +- website/docs/docs/deploy/retry-jobs.md | 2 +- website/docs/docs/deploy/run-visibility.md | 6 +- website/docs/docs/deploy/source-freshness.md | 6 +- .../using-the-dbt-ide.md | 8 +- .../use-dbt-semantic-layer/sl-architecture.md | 2 +- website/docs/faqs/API/rotate-token.md | 2 +- .../faqs/Accounts/change-users-license.md | 4 +- .../Accounts/cloud-upgrade-instructions.md | 6 +- website/docs/faqs/Accounts/delete-users.md | 6 +- .../Environments/custom-branch-settings.md | 2 +- website/docs/faqs/Git/git-migration.md | 2 +- website/docs/faqs/Git/gitignore.md | 8 +- website/docs/faqs/Project/delete-a-project.md | 4 +- .../docs/faqs/Troubleshooting/gitignore.md | 12 +- website/docs/guides/adapter-creation.md | 14 +-- website/docs/guides/bigquery-qs.md | 4 +- website/docs/guides/codespace-qs.md | 2 +- website/docs/guides/custom-cicd-pipelines.md | 6 +- website/docs/guides/databricks-qs.md | 32 +++--- website/docs/guides/dbt-python-snowpark.md | 106 +++++++++--------- website/docs/guides/dremio-lakehouse.md | 10 +- website/docs/guides/manual-install-qs.md | 10 +- website/docs/guides/microsoft-fabric-qs.md | 4 +- ...oductionize-your-dbt-databricks-project.md | 2 +- website/docs/guides/redshift-qs.md | 20 ++-- website/docs/guides/refactoring-legacy-sql.md | 4 +- website/docs/guides/set-up-ci.md | 6 +- website/docs/guides/snowflake-qs.md | 24 ++-- website/docs/guides/starburst-galaxy-qs.md | 10 +- website/docs/reference/commands/clone.md | 2 +- .../node-selection/graph-operators.md | 2 +- .../resource-configs/bigquery-configs.md | 2 +- .../resource-configs/persist_docs.md | 4 +- .../resource-configs/spark-configs.md | 2 +- .../resource-properties/description.md | 2 +- website/docs/terms/dag.md | 6 +- website/docs/terms/data-lineage.md | 4 +- website/snippets/_cloud-environments-info.md | 7 +- website/snippets/_new-sl-setup.md | 6 +- website/snippets/_sl-run-prod-job.md | 2 +- .../intro-build-models-atop-other-models.md | 2 +- website/src/components/lightbox/index.js | 57 +++------- .../src/components/lightbox/styles.module.css | 7 -- 146 files changed, 583 insertions(+), 586 deletions(-) diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md index 357f8f0d751..4ebbf83bf5f 100644 --- a/contributing/content-style-guide.md +++ b/contributing/content-style-guide.md @@ -56,7 +56,7 @@ docs.getdbt.com uses its own CSS, and Docusaurus supports its own specific Markd | Link - topic in different folder | `[Title](/folder/file-name) without file extension`* | | Link - section in topic in same folder | `[Title](/folder/file-name#section-name)`* | | Link - section in topic in different folder | `[Title](/folder/file-name#section-name)`* | -| Image | ``| +| Image | ``| *docs.getdbt.com uses specific folders when linking to topics or sections. A successful link syntax begins with one of the following folder paths: diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md index 09274b41a9b..93cf91efeed 100644 --- a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md +++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md @@ -70,9 +70,9 @@ An obvious choice if you have data to load into your warehouse would be your exi [Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either. - + - + A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated. @@ -100,7 +100,7 @@ The main benefit of connecting to Google Sheets instead of a static spreadsheet Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching. - + Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null. @@ -119,7 +119,7 @@ Beware of inconsistent data types though—if someone types text into a column t I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse. - + Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side. @@ -174,7 +174,7 @@ Each of the major data warehouses also has native integrations to import spreads Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system. - + ### BigQuery diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md index b936d4786cd..3ea7a459c35 100644 --- a/website/blog/2022-11-30-dbt-project-evaluator.md +++ b/website/blog/2022-11-30-dbt-project-evaluator.md @@ -20,7 +20,7 @@ If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), yo Don’t believe me??? Here’s photographic proof. - + Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse. @@ -120,4 +120,4 @@ If something isn’t working quite right or you have ideas for future functional Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond. - + diff --git a/website/blog/2023-01-17-grouping-data-tests.md b/website/blog/2023-01-17-grouping-data-tests.md index 3648837302b..23fcce6d27e 100644 --- a/website/blog/2023-01-17-grouping-data-tests.md +++ b/website/blog/2023-01-17-grouping-data-tests.md @@ -43,11 +43,11 @@ So what do we discover when we validate our data by group? Testing for monotonicity, we find many poorly behaved turnstiles. Unlike the well-behaved dark blue line, other turnstiles seem to _decrement_ versus _increment_ with each rotation while still others cyclically increase and plummet to zero – perhaps due to maintenance events, replacements, or glitches in communication with the central server. - + Similarly, while no expected timestamp is missing from the data altogether, a more rigorous test of timestamps _by turnstile_ reveals between roughly 50-100 missing observations for any given period. - + _Check out this [GitHub gist](https://gist.github.com/emilyriederer/4dcc6a05ea53c82db175e15f698a1fb6) to replicate these views locally._ diff --git a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md index 51a62006ee8..99ce142d5ed 100644 --- a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md +++ b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md @@ -125,7 +125,7 @@ In both cases, the operation can be done on a single partition at a time so it r On a 192 GB partition here is how the different methods compare: - + Also, the `SELECT` statement consumed more than 10 hours of slot time while `MERGE` statement took days of slot time. diff --git a/website/blog/2023-03-23-audit-helper.md b/website/blog/2023-03-23-audit-helper.md index 106715c5e4f..8599ad5eb5d 100644 --- a/website/blog/2023-03-23-audit-helper.md +++ b/website/blog/2023-03-23-audit-helper.md @@ -19,7 +19,7 @@ It is common for analytics engineers (AE) and data analysts to have to refactor Not only is that approach time-consuming, but it is also prone to naive assumptions that values match based on aggregate measures (such as counts or sums). To provide a better, more accurate approach to auditing, dbt Labs has created the `audit_helper` package. `audit_helper` is a package for dbt whose main purpose is to audit data by comparing two tables (the original one versus a refactored model). It uses a simple and intuitive query structure that enables quickly comparing tables based on the column values, row amount, and even column types (for example, to make sure that a given column is numeric in both your table and the original one). Figure 1 graphically displays the workflow and where `audit_helper` is positioned in the refactoring process. - + Now that it is clear where the `audit_helper` package is positioned in the refactoring process, it is important to highlight the benefits of using audit_helper (and ultimately, of auditing refactored models). Among the benefits, we can mention: - **Quality assurance**: Assert that a refactored model is reaching the same output as the original model that is being refactored. @@ -57,12 +57,12 @@ According to the `audit_helper` package documentation, this macro comes in handy ### How it works When you run the dbt audit model, it will compare all columns, row by row. To count for the match, every column in a row from one source must exactly match a row from another source, as illustrated in the example in Figure 2 below: - + As shown in the example, the model is compared line by line, and in this case, all lines in both models are equivalent and the result should be 100%. Figure 3 below depicts a row in which two of the three columns are equal and only the last column of row 1 has divergent values. In this case, despite the fact that most of row 1 is identical, that row will not be counted towards the final result. In this example, only row 2 and row 3 are valid, yielding a 66.6% match in the total of analyzed rows. - + As previously stated, for the match to be valid, all column values of a model’s row must be equal to the other model. This is why we sometimes need to exclude columns from the comparison (such as date columns, which can have a time zone difference from the original model to the refactored — we will discuss tips like these below). @@ -103,12 +103,12 @@ Let’s understand the arguments used in the `compare_queries` macro: - `summarize` (optional): This argument allows you to switch between a summary or detailed (verbose) view of the compared data. This argument accepts true or false values (its default is set to be true). 3. Replace the sources from the example with your own - + As illustrated in Figure 4, using the `ref` statements allows you to easily refer to your development model, and using the full path makes it easy to refer to the original table (which will be useful when you are refactoring a SQL Server Stored Procedure or Alteryx Workflow that is already being materialized in the data warehouse). 4. Specify your comparison columns - + Delete the example columns and replace them with the columns of your models, exactly as they are written in each model. You should rename/alias the columns to match, as well as ensuring they are in the same order within the `select` clauses. @@ -129,7 +129,7 @@ Let’s understand the arguments used in the `compare_queries` macro: ``` The output will be the similar to the one shown in Figure 6 below: - +
The output is presented in table format, with each column explained below:
@@ -155,7 +155,7 @@ While we can surely rely on that overview to validate the final refactored model A really useful way to check out which specific columns are driving down the match percentage between tables is the `compare_column_values` macro that allows us to audit column values. This macro requires a column to be set, so it can be used as an anchor to compare entries between the refactored dbt model column and the legacy table column. Figure 7 illustrates how the `compare_column_value`s macro works. - + The macro’s output summarizes the status of column compatibility, breaking it down into different categories: perfect match, both are null, values do not match, value is null in A only, value is null in B only, missing from A and missing from B. This level of detailing makes it simpler for the AE or data analyst to figure out what can be causing incompatibility issues between the models. While refactoring a model, it is common that some keys used to join models are inconsistent, bringing up unwanted null values on the final model as a result, and that would cause the audit row query to fail, without giving much more detail. @@ -224,7 +224,7 @@ Also, we can see that the example code includes a table printing option enabled But unlike from the `compare_queries` macro, if you have kept the printing function enabled, you should expect a table to be printed in the command line when you run the model, as shown in Figure 8. Otherwise, it will be materialized on your data warehouse like this: - + The `compare_column_values` macro separates column auditing results in seven different labels: - **Perfect match**: count of rows (and relative percentage) where the column values compared between both tables are equal and not null; diff --git a/website/blog/2023-04-17-dbt-squared.md b/website/blog/2023-04-17-dbt-squared.md index 95288384ecc..5cac73459a8 100644 --- a/website/blog/2023-04-17-dbt-squared.md +++ b/website/blog/2023-04-17-dbt-squared.md @@ -54,7 +54,7 @@ The second architectural decision was whether or not to create a single dbt proj The resulting data flow from core to country teams now follows this pattern. The *Sources* database holds all of the raw data in the Redshift cluster and the *Integrated* database contains the curated and ready-for-consumption outputs from the core dbt project. These outputs are termed Source Data Products (SDPs). These SDPs are then leveraged by the core team to build Global Data Products—products tailored to answering business questions for global stakeholders. They are also filtered at the country-level and used as sources to the country-specific Data Products managed by the country teams. These, in turn, are hosted in the respective `affiliate_db_` database. Segregating at the database-level facilitates data governance and privacy management. - + ### People @@ -68,7 +68,7 @@ The success of this program relied on adopting DevOps practices from the start. Often overlooked, this third pillar of process can be the key to success when scaling a global platform. Simple things, such as accounting for time zone differences, can determine whether a message gets across the board. To facilitate the communication and coordination between Global and Country teams, all the teams follow the same sprint cycle, and we hold weekly scrum of scrums. We needed to set up extensive onboarding documentation, ensure newcomers had proper training and guidance, and create dedicated slack channels for announcements, incident reporting, and occasional random memes, helping build a community that stretches from Brazil to Malaysia. - + ## The solution: dbt Squared diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md index eb357698d4b..ab364749eff 100644 --- a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md +++ b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md @@ -14,7 +14,7 @@ is_featured: true Dimensional modeling is one of many data modeling techniques that are used by data practitioners to organize and present data for analytics. Other data modeling techniques include Data Vault (DV), Third Normal Form (3NF), and One Big Table (OBT) to name a few. - + While the relevance of dimensional modeling [has been debated by data practitioners](https://discourse.getdbt.com/t/is-kimball-dimensional-modeling-still-relevant-in-a-modern-data-warehouse/225/6), it is still one of the most widely adopted data modeling technique for analytics. @@ -39,7 +39,7 @@ Dimensional modeling is a technique introduced by Ralph Kimball in 1996 with his The goal of dimensional modeling is to take raw data and transform it into Fact and Dimension tables that represent the business. - + The benefits of dimensional modeling are: @@ -143,7 +143,7 @@ Examine the database source schema below, paying close attention to: - Keys - Relationships - + ### Step 8: Query the tables @@ -185,7 +185,7 @@ Now that you’ve set up the dbt project, database, and have taken a peek at the Identifying the business process is done in collaboration with the business user. The business user has context around the business objectives and business processes, and can provide you with that information. - + Upon speaking with the CEO of AdventureWorks, you learn the following information: @@ -222,11 +222,11 @@ There are two tables in the sales schema that catch our attention. These two tab - The `sales.salesorderheader` table contains information about the credit card used in the order, the shipping address, and the customer. Each record in this table represents an order header that contains one or more order details. - The `sales.salesorderdetail` table contains information about the product that was ordered, and the order quantity and unit price, which we can use to calculate the revenue. Each record in this table represents a single order detail. - + Let’s define a fact table called `fct_sales` which joins `sales.salesorderheader` and `sales.salesorderdetail` together. Each record in the fact table (also known as the [grain](https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/grain/)) is an order detail. - + ### Dimension tables @@ -250,19 +250,19 @@ Based on the business questions that our business user would like answered, we c There are different ways we could create the dimension tables. We could use the existing relationships between the tables as depicted in the diagram below. - + This is known as a snowflake schema design, where the fact table is the centre of the snowflake, and there are many fractals branching off the centre of the snowflake. However, this results in many joins that need to be performed by the consumer of the dimensional model. Instead, we can denormalize the dimension tables by performing joins. - + This is known as a star schema and this approach reduces the amount of joins that need to be performed by the consumer of the dimensional model. Using the star schema approach, we can identify 6 dimensions as shown below that will help us answer the business questions: - + - `dim_product` : a dimension table that joins `product` , `productsubcategory`, `productcategory` - `dim_address` : a dimension table that joins `address` , `stateprovince`, `countryregion` @@ -617,7 +617,7 @@ Great work, you have successfully created your very first fact and dimension tab Let’s make it easier for consumers of our dimensional model to understand the relationships between tables by creating an [Entity Relationship Diagram (ERD)](https://www.visual-paradigm.com/guide/data-modeling/what-is-entity-relationship-diagram/). - + The ERD will enable consumers of our dimensional model to quickly identify the keys and relationship type (one-to-one, one-to-many) that need to be used to join tables. @@ -694,7 +694,7 @@ Using `dbt_utils.star()`, we select all columns except the surrogate key columns We can then build the OBT by running `dbt run`. Your dbt DAG should now look like this: - + Congratulations, you have reached the end of this tutorial. If you want to learn more, please see the learning resources below on dimensional modeling. diff --git a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md index 2c6a9d87591..46cfcb58cdd 100644 --- a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md +++ b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md @@ -17,7 +17,7 @@ Alteryx is a visual data transformation platform with a user-friendly interface Transforming data to follow business rules can be a complex task, especially with the increasing amount of data collected by companies. To reduce such complexity, data transformation solutions designed as drag-and-drop tools can be seen as more intuitive, since analysts can visualize the steps taken to transform data. One example of a popular drag-and-drop transformation tool is Alteryx which allows business analysts to transform data by dragging and dropping operators in a canvas. The graphic interface of Alteryx Designer is presented in **Figure 1**. - + Nonetheless, as data workflows become more complex, Alteryx lacks the modularity, documentation, and version control capabilities that these flows require. In this sense, dbt may be a more suitable solution to building resilient and modular data pipelines due to its focus on data modeling. @@ -62,7 +62,7 @@ This blog post reports a consulting project for a major client at Indicium Tech When the client hired Indicium, they had dozens of Alteryx workflows built and running daily solely for the marketing team, which was the focus of the project. For the marketing team, the Alteryx workflows had to be executed in the correct order since they were interdependent, which means one Alteryx workflow used the outcome of the previous one, and so on. The main Alteryx workflows run daily by the marketing team took about 6 hours to run. Another important aspect to consider was that if a model had not finished running when the next one downstream began to run, the data would be incomplete, requiring the workflow to be run again. The execution of all models was usually scheduled to run overnight and by early morning, so the data would be up to date the next day. But if there was an error the night before, the data would be incorrect or out of date. **Figure 3** exemplifies the scheduler. - + Data lineage was a point that added a lot of extra labor because it was difficult to identify which models were dependent on others with so many Alteryx workflows built. When the number of workflows increased, it required a long time to create a view of that lineage in another software. So, if a column's name changed in a model due to a change in the model's source, the marketing analysts would have to map which downstream models were impacted by such change to make the necessary adjustments. Because model lineage was mapped manually, it was a challenge to keep it up to date. @@ -89,7 +89,7 @@ The first step is to validate all data sources and create one com It is essential to click on each data source (the green book icons on the leftmost side of **Figure 5**) and examine whether any transformations have been done inside that data source query. It is very common for a source icon to contain more than one data source or filter, which is why this step is important. The next step is to follow the workflow and transcribe the transformations into SQL queries in the dbt models to replicate the same data transformations as in the Alteryx workflow. - + For this step, we identified which operators were used in the data source (for example, joining data, order columns, group by, etc). Usually the Alteryx operators are pretty self-explanatory and all the information needed for understanding appears on the left side of the menu. We also checked the documentation to understand how each Alteryx operator works behind the scenes. @@ -102,7 +102,7 @@ Auditing large models, with sometimes dozens of columns and millions of rows, ca In this project, we used [the `audit_helper` package](https://github.com/dbt-labs/dbt-audit-helper), because it provides more robust auditing macros and offers more automation possibilities for our use case. To that end, we needed to have both the legacy Alteryx workflow output table and the refactored dbt model materialized in the project’s data warehouse. Then we used the macros available in `audit_helper` to compare query results, data types, column values, row numbers and many more things that are available within the package. For an in-depth explanation and tutorial on how to use the `audit_helper` package, check out [this blog post](https://docs.getdbt.com/blog/audit-helper-for-migration). **Figure 6** graphically illustrates the validation logic behind audit_helper. - + #### Step 4: Duplicate reports and connect them to the dbt refactored models @@ -120,7 +120,7 @@ The conversion proved to be of great value to the client due to three main aspec - Improved workflow visibility: dbt’s support for documentation and testing, associated with dbt Cloud, allows for great visibility of the workflow’s lineage execution, accelerating errors and data inconsistencies identification and troubleshooting. More than once, our team was able to identify the impact of one column’s logic alteration in downstream models much earlier than these Alteryx models. - Workflow simplification: dbt’s modularized approach of data modeling, aside from accelerating total run time of the data workflow, simplified the construction of new tables, based on the already existing modules, and improved code readability. - + As we can see, refactoring Alteryx to dbt was an important step in the direction of data availability, and allowed for much more agile processes for the client’s data team. With less time dedicated to manually executing sequential Alteryx workflows that took hours to complete, and searching for errors in each individual file, the analysts could focus on what they do best: **getting insights from the data and generating value from them**. diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md index 2b00787cc07..f719bdb40cb 100644 --- a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md +++ b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md @@ -22,7 +22,7 @@ To help visualize this data, we're going to pretend we are a company that manufa Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like: - + This hierarchy is *ragged* because different paths through the hierarchy terminate at different depths. It is *time-varying* because specific components can be added and removed. diff --git a/website/blog/2023-05-04-generating-dynamic-docs.md b/website/blog/2023-05-04-generating-dynamic-docs.md index b6e8d929e72..1e704178b0a 100644 --- a/website/blog/2023-05-04-generating-dynamic-docs.md +++ b/website/blog/2023-05-04-generating-dynamic-docs.md @@ -35,7 +35,7 @@ This results in a lot of the same columns (e.g. `account_id`) existing in differ In fact, I found a better way using some CLI commands, the dbt Codegen package and docs blocks. I also made the following meme in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) channel #memes-and-off-topic-chatter to encapsulate this method: - + ## What pain is being solved? @@ -279,7 +279,7 @@ To confirm the formatting works, run the following command to get dbt Docs up an ``` $ dbt docs && dbt docs serve ``` - + Here, you can confirm that the column descriptions using the doc blocks are working as intended. @@ -326,7 +326,7 @@ user_id ``` Now, open your code editor, and replace `(.*)` with `{% docs column__activity_based_interest__$1 %}\n\n{% enddocs %}\n`, which will result in the following in your markdown file: - + Now you can add documentation to each of your columns. @@ -334,7 +334,7 @@ Now you can add documentation to each of your columns. You can programmatically identify all columns, and have them point towards the newly-created documentation. In your code editor, replace `\s{6}- name: (.*)\n description: ""` with ` - name: $1\n description: "{{ doc('column__activity_based_interest__$1') }}`: - + ⚠️ Some of your columns may already be available in existing docs blocks. In this example, the following replacements are done: - `{{ doc('column__activity_based_interest__user_id') }}` → `{{ doc("column_user_id") }}` @@ -343,7 +343,7 @@ You can programmatically identify all columns, and have them point towards the n ## Check that everything works Run `dbt docs generate`. If there are syntax errors, this will be found out at this stage. If successful, we can run `dbt docs serve` to perform a smoke test and ensure everything looks right: - + ## Additional considerations diff --git a/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md b/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md index ac6aee5176c..a8b0e1f9f8c 100644 --- a/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md +++ b/website/blog/2023-05-08-building-a-historical-user-segmentation-model-with-dbt.md @@ -21,7 +21,7 @@ Take for example a Customer Experience (CX) team that uses Salesforce as a CRM. An improvement to this would be to prioritize the tickets based on the customer segment, answering our most valuable customers first. An Analytics Engineer can build a segmentation to identify the power users (for example with an RFM approach) and store it in the data warehouse. The Data Engineering team can then export that user attribute to the CRM, allowing the customer experience team to build rules on top of it. - + ## Problems @@ -58,7 +58,7 @@ The goal of RFM analysis is to segment customers into groups based on how recent We are going to use just the Recency and Frequency matrix, and use the Monetary value as an accessory attribute. This is a common approach in companies where the Frequency and the Monetary Value are highly correlated. - + ### RFM model for current segment @@ -390,7 +390,7 @@ FROM current_segments With the new approach, our dependency graph would look like this: - + - For analysts that want to see how the segments changed over time, they can query the historical model. There is also an option to build an aggregated model before loading it in a Business Intelligence tool. - For ML model training, data scientists and machine learning practitioners can import this model into their notebooks or their feature store, instead of rebuilding the attributes from scratch. diff --git a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md index 98586b2552c..6b1012a5320 100644 --- a/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md +++ b/website/blog/2023-07-03-data-vault-2-0-with-dbt-cloud.md @@ -26,7 +26,7 @@ A new development in the city? No problem! Just hook up the new pipes to the res Data Vault is the dam and reservoir: it is the well-engineered data model to structure an organization’s data from source systems for use by downstream data projects – rather than each team collecting data straight from the source. The Data Vault data model is designed using a few well-applied principles, and in practice, pools source data so it is available for use by all downstream consumers. This promotes a scalable data warehouse through reusability and modularity. - + ## Data Vault components @@ -139,7 +139,7 @@ Within the [dq_tools](https://hub.getdbt.com/infinitelambda/dq_tools/latest/) _p To help you get started, [we have created a template GitHub project](https://github.com/infinitelambda/dbt-data-vault-template) you can utilize to understand the basic principles of building Data Vault with dbt Cloud using one of the abovementioned packages. But if you need help building your Data Vault, get in touch. - + ### Entity Relation Diagrams (ERDs) and dbt diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md index 12e380eb220..84f756919a5 100644 --- a/website/blog/2023-07-17-GPT-and-dbt-test.md +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -55,7 +55,7 @@ We all know how ChatGPT can digest very complex prompts, but as this is a tool f Opening ChatGPT with GPT4, my first prompt is usually along these lines: - + And the output of this simple prompt is nothing short of amazing: @@ -118,7 +118,7 @@ Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context o A prompt for it would look something like: - + ## Specify details on generic tests in your prompts @@ -133,7 +133,7 @@ Accepted_values and relationships are slightly trickier but the model can be adj One way of doing this is with a prompt like this: - + Which results in the following output: diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index 6534e1d0b56..eb9716e73a5 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -103,7 +103,7 @@ When we talk about using materialized views in development, the question to thin Outside of the scheduling part, development will be pretty standard. Your pipeline is likely going to look something like this: - + This is assuming you have a near real time pipeline where you are pulling from a streaming data source like a Kafka Topic via an ingestion tool of your choice like Snowpipe for Streaming into your data platform. After your data is in the data platform, you will: diff --git a/website/blog/2023-11-14-specify-prod-environment.md b/website/blog/2023-11-14-specify-prod-environment.md index ecb6ddc8b25..c6ad2b31027 100644 --- a/website/blog/2023-11-14-specify-prod-environment.md +++ b/website/blog/2023-11-14-specify-prod-environment.md @@ -56,7 +56,7 @@ By using the environment as the arbiter of state, any time a change is made to y ## The easiest way to break apart your jobs {#how} - + For most projects, changing from a job-centric to environment-centric approach to metadata is straightforward and immediately pays dividends as described above. Assuming that your Staging/CI and Production jobs are currently intermingled, you can extricate them as follows: diff --git a/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md index c5e541358b9..44499c51ec5 100644 --- a/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md +++ b/website/blog/2023-12-11-semantic-layer-on-semantic-layer.md @@ -31,7 +31,7 @@ There are [plenty of other great resources](https://docs.getdbt.com/docs/build/p - + @@ -42,7 +42,7 @@ Let’s walk through the DAG from left to right: First, we have raw tables from What [is a semantic model](https://docs.getdbt.com/docs/build/semantic-models)? Put simply, semantic models contain the components we need to build metrics. Semantic models are YAML files that live in your dbt project. They contain metadata about your dbt models in a format that MetricFlow, the query builder that powers the semantic layer, can understand. The DAG below in [dbt Explorer](https://docs.getdbt.com/docs/collaborate/explore-projects) shows the metrics we’ve built off of `semantic_layer_queries`. - + Let’s dig into semantic models and metrics a bit more, and explain some of the data modeling decisions we made. First, we needed to decide what model to use as a base for our semantic model. We decide to use`fct_semantic_layer`queries as our base model because defining a semantic model on top of a normalized fact table gives us maximum flexibility to join to other tables. This increased the number of dimensions available, which means we can answer more questions. @@ -79,13 +79,13 @@ To query to Semantic Layer you have two paths: you can query metrics directly th The leg work of building our pipeline and defining metrics is all done, which makes last-mile consumption much easier. First, we set up a launch dashboard in Hex as the source of truth for semantic layer product metrics. This tool is used by cross-functional partners like marketing, sales, and the executive team to easily check product and usage metrics like total semantic layer queries, or weekly active semantic layer users. To set up our Hex connection, we simply enter a few details from our dbt Cloud environment and then we can work with metrics directly in Hex notebooks. We can use the JDBC interface, or use Hex’s GUI metric builder to build reports. We run all our WBRs off this dashboard, which allows us to spot trends in consumption and react quickly to changes in our business. - + On the finance and operations side, product usage data is crucial to making informed pricing decisions. All our pricing models are created in spreadsheets, so we leverage the Google Sheets integration to give those teams access to consistent data sets without the need to download CSVs from the Hex dashboard. This lets the Pricing team add dimensional slices, like tier and company size, to the data in a self-serve manner without having to request data team resources to generate those insights. This allows our finance team to iteratively build financial models and be more self-sufficient in pulling data, instead of relying on data team resources. - + As a former data scientist and data engineer, I personally think this is a huge improvement over the approach I would have used without the semantic layer. My old approach would have been to materialize One Big Table with all the numeric and categorical columns I needed for analysis. Then write a ton of SQL in Hex or various notebooks to create reports for stakeholders. Inevitably I’m signing up for more development cycles to update the pipeline whenever a new dimension needs to be added or the data needs to be aggregated in a slightly different way. From a data team management perspective, using a central semantic layer saves data analysts cycles since users can more easily self-serve. At every company I’ve ever worked at, data analysts are always in high demand, with more requests than they can reasonably accomplish. This means any time a stakeholder can self-serve their data without pulling us in is a huge win. diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md index 7daa4c2f990..96e2ed53f85 100644 --- a/website/blog/2024-01-09-defer-in-development.md +++ b/website/blog/2024-01-09-defer-in-development.md @@ -58,15 +58,15 @@ Let’s think back to the hypothetical above — what if we made use of the Let’s take a look at a simplified example — let’s say your project looks like this in production: - + And you’re tasked with making changes to `model_f`. Without defer, you would need to make sure to at minimum execute a `dbt run -s +model_f` to ensure all the upstream dependencies of `model_f` are present in your development schema so that you can start to run `model_f`.* You just spent a whole bunch of time and money duplicating your models, and now your warehouse looks like this: - + With defer, we should not build anything other than the models that have changed, and are now different from their production counterparts! Let’s tell dbt to use production metadata to resolve our refs, and only build the model I have changed — that command would be `dbt run -s model_f --defer` .** - + This results in a *much slimmer build* — we read data in directly from the production version of `model_b` and `model_c`, and don’t have to worry about building anything other than what we selected! @@ -80,7 +80,7 @@ dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and th In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! - + The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : @@ -100,13 +100,13 @@ One of the major gotchas in the defer workflow is that when you’re in defer mo Let’s take a look at that example above again, and pretend that some time before we went to make this edit, we did some work on `model_c`, and we have a local copy of `model_c` hanging out in our development schema: - + When you run `dbt run -s model_f --defer` , dbt will detect the development copy of `model_c` and say “Hey, y’know, I bet Dave is working on that model too, and he probably wants to make sure his changes to `model_c` work together with his changes to `model_f` . Because I am a kind and benevolent data transformation tool, i’ll make sure his `{{ ref('model_c') }]` function compiles to his development changes!” Thanks dbt! As a result, we’ll effectively see this behavior when we run our command: - + Where our code would compile from @@ -155,6 +155,6 @@ While defer is a faster and cheaper option for most folks in most situations, de ### Call me Willem Defer - + Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/docs/best-practices/clone-incremental-models.md b/website/docs/best-practices/clone-incremental-models.md index 99982042de1..11075b92161 100644 --- a/website/docs/best-practices/clone-incremental-models.md +++ b/website/docs/best-practices/clone-incremental-models.md @@ -17,11 +17,11 @@ Imagine you've created a [Slim CI job](/docs/deploy/continuous-integration) in d - Run the command `dbt build --select state:modified+` to run and test all of the models you've modified and their downstream dependencies. - Trigger whenever a developer on your team opens a PR against the main branch. - + Now imagine your dbt project looks something like this in the DAG: - + When you open a pull request (PR) that modifies `dim_wizards`, your CI job will kickoff and build _only the modified models and their downstream dependencies_ (in this case, `dim_wizards` and `fct_orders`) into a temporary schema that's unique to your PR. @@ -49,7 +49,7 @@ You'll have two commands for your dbt Cloud CI check to execute: Because of your first clone step, the incremental models selected in your `dbt build` on the second step will run in incremental mode. - + Your CI jobs will run faster, and you're more accurately mimicking the behavior of what will happen once the PR has been merged into main. diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index 5f230263cf8..a55e1d121af 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -21,11 +21,11 @@ If you use multiple Databricks workspaces to isolate development from production To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. - + When you create environments in dbt Cloud, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. - + ## Access Control diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md index e229df2dfc8..ea2efcabf06 100644 --- a/website/docs/docs/build/about-metricflow.md +++ b/website/docs/docs/build/about-metricflow.md @@ -55,7 +55,7 @@ For a semantic model, there are three main pieces of metadata: * [Dimensions](/docs/build/dimensions) — These are the ways you want to group or slice/dice your metrics. * [Measures](/docs/build/measures) — The aggregation functions that give you a numeric result and can be used to create your metrics. - + ### Metrics diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index 4786641678d..ac7036de572 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -21,9 +21,9 @@ where created_at > date_trunc('month', current_date) To set a custom target name for a job in dbt Cloud, configure the **Target Name** field for your job in the Job Settings page. - + ## dbt Cloud IDE When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. - + diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 7c12e5d7059..d981d7e272d 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -245,7 +245,7 @@ Normally, a data test query will calculate failures as part of its execution. If This workflow allows you to query and examine failing records much more quickly in development: - + Note that, if you elect to store test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).) diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 14076352ac1..3f2aebd0036 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -17,7 +17,7 @@ Environment variables in dbt Cloud must be prefixed with either `DBT_` or `DBT_E Environment variable values can be set in multiple places within dbt Cloud. As a result, dbt Cloud will interpret environment variables according to the following order of precedence (lowest to highest): - + There are four levels of environment variables: 1. the optional default argument supplied to the `env_var` Jinja function in code @@ -30,7 +30,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -38,7 +38,7 @@ You'll notice there is a `Project Default` column. This is a great place to set To the right of the `Project Default` column are all your environments. Values set at the environment level take priority over the project level default value. This is where you can tell dbt Cloud to interpret an environment value differently in your Staging vs. Production environment, as example. - + @@ -48,12 +48,12 @@ You may have multiple jobs that run in the same environment, and you'd like the When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. - + Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. - + **Overriding environment variables at the personal level** @@ -61,11 +61,11 @@ Every job runs in a specific, deployment environment, and by default, a job will You can also set a personal value override for an environment variable when you develop in the dbt integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." - + To supply an override, developers can edit and specify a different value to use. These values will be respected in the IDE both for the Results and Compiled SQL tabs. - + :::info Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt Cloud does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". @@ -77,7 +77,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -86,7 +86,7 @@ There are some known issues with partial parsing of a project and changing envir While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in the UI, you can prefix the key with `DBT_ENV_SECRET_`. This functionality is supported from `dbt v1.0` and on. - + **Note**: An environment variable can be used to store a [git token for repo cloning](/docs/build/environment-variables#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. @@ -131,7 +131,7 @@ Currently, it's not possible to dynamically set environment variables across mod **Note** — You can also use this method with Databricks SQL Warehouse. - + :::info Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index a26ac10bd36..65c0792e0a0 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -118,8 +118,8 @@ dbt test -s +exposure:weekly_jaffle_report When we generate our documentation site, you'll see the exposure appear: - - + + ## Related docs diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index b24d3129f0c..3fe194a4cb7 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ Use the `cluster` submission method with dedicated Dataproc clusters you or your - Enable Dataproc APIs for your project + region - If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot) - + The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc) or configure them on specific Python models: - `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. @@ -706,7 +706,7 @@ Google recommends installing Python packages on Dataproc clusters via initializa You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. - + **Docs:** - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index d19354d9199..5c6883cdcee 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -18,7 +18,7 @@ Semantic models are the foundation for data definition in MetricFlow, which powe - Configure semantic models in a YAML file within your dbt project directory. - Organize them under a `metrics:` folder or within project sources as needed. - + Semantic models have 6 components and this page explains the definitions with some examples: diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index e4fb10ac725..466bcedc688 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -84,7 +84,7 @@ left join raw.jaffle_shop.customers using (customer_id) Using the `{{ source () }}` function also creates a dependency between the model and the source table. - + ### Testing and documenting sources You can also: @@ -189,7 +189,7 @@ from raw.jaffle_shop.orders The results of this query are used to determine whether the source is fresh or not: - + ### Filter diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index d33e4798974..a0dd174278b 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -254,7 +254,7 @@ create view analytics.customers as ( dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). - + * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. diff --git a/website/docs/docs/cloud/about-cloud-develop-defer.md b/website/docs/docs/cloud/about-cloud-develop-defer.md index f6478c83970..37bfaacfd0c 100644 --- a/website/docs/docs/cloud/about-cloud-develop-defer.md +++ b/website/docs/docs/cloud/about-cloud-develop-defer.md @@ -36,7 +36,7 @@ To enable defer in the dbt Cloud IDE, toggle the **Defer to production** button For example, if you were to start developing on a new branch with [nothing in your development schema](/reference/node-selection/defer#usage), edit a single model, and run `dbt build -s state:modified` — only the edited model would run. Any `{{ ref() }}` functions will point to the production location of the referenced models. - + ### Defer in dbt Cloud CLI diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index bc4a515112d..93bbf83584f 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -22,7 +22,7 @@ import MSCallout from '/snippets/_microsoft-adapters-soon.md'; You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. - + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) diff --git a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md index eecf0a8e229..0186d821a54 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md @@ -36,4 +36,4 @@ HTTP and Thrift connection methods: | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md index ebf6be63bd1..032246ad16a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md @@ -37,4 +37,4 @@ To set up the Databricks connection, supply the following fields: | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md index 2109e281e6a..06b9dd62f1a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md @@ -15,7 +15,7 @@ The following fields are required when creating a Postgres, Redshift, or AlloyDB **Note**: When you set up a Redshift or Postgres connection in dbt Cloud, SSL-related parameters aren't available as inputs. - + For dbt Cloud users, please log in using the default Database username and password. Note this is because [`IAM` authentication](https://docs.aws.amazon.com/redshift/latest/mgmt/generating-user-credentials.html) is not compatible with dbt Cloud. @@ -25,7 +25,7 @@ To connect to a Postgres, Redshift, or AlloyDB instance via an SSH tunnel, selec Once the connection is saved, a public key will be generated and displayed for the Connection. You can copy this public key to the bastion server to authorize dbt Cloud to connect to your database via the bastion server. - + #### About the Bastion server in AWS diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index 05f0c1dc07a..c265529fb49 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -27,7 +27,7 @@ username (specifically, the `login_name`) and the corresponding user's Snowflake to authenticate dbt Cloud to run queries against Snowflake on behalf of a Snowflake user. **Note**: The schema field in the **Developer Credentials** section is a required field. - + ### Key Pair @@ -59,7 +59,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private -----END ENCRYPTED PRIVATE KEY----- ``` - + ### Snowflake OAuth @@ -68,7 +68,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private The OAuth auth method permits dbt Cloud to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt Cloud. For more information on configuring a Snowflake OAuth connection in dbt Cloud, please see [the docs on setting up Snowflake OAuth](/docs/cloud/manage-access/set-up-snowflake-oauth). - + ## Configuration diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 2e637b7450a..7ea6e380000 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -32,7 +32,7 @@ In addition to these fields, there are two other optional fields that can be con - + ### BigQuery OAuth **Available in:** Development environments, Enterprise plans only @@ -43,7 +43,7 @@ more information on the initial configuration of a BigQuery OAuth connection in [the docs on setting up BigQuery OAuth](/docs/cloud/manage-access/set-up-bigquery-oauth). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your personal Profile in dbt Cloud, like so: - + ## Configuration diff --git a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md index 3a9f8d9e872..57146ec513a 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md @@ -22,7 +22,7 @@ These [features](#dbt-cloud-ide-features) create a powerful editing environment - + @@ -83,7 +83,7 @@ There are three start-up states when using or launching the Cloud IDE: The Cloud IDE needs explicit action to save your changes. There are three ways your work is stored: - **Unsaved, local code —** The browser stores your code only in its local storage. In this state, you might need to commit any unsaved changes in order to switch branches or browsers. If you have saved and committed changes, you can access the "Change branch" option even if there are unsaved changes. But if you attempt to switch branches without saving changes, a warning message will appear, notifying you that you will lose any unsaved changes. - + - **Saved but uncommitted code —** When you save a file, the data gets stored in durable, long-term storage, but isn't synced back to git. To switch branches using the **Change branch** option, you must "Commit and sync" or "Revert" changes. Changing branches isn't available for saved-but-uncommitted code. This is to ensure your uncommitted changes don't get lost. - **Committed code —** This is stored in the branch with your git provider and you can check out other (remote) branches. @@ -108,7 +108,7 @@ Set up your developer credentials: 4. Enter the details under **Development Credentials**. 5. Click **Save.** - + 6. Access the Cloud IDE by clicking **Develop** at the top of the page. @@ -124,7 +124,7 @@ If a model or test fails, dbt Cloud makes it easy for you to view and download t Use dbt's [rich model selection syntax](/reference/node-selection/syntax) to [run dbt commands](/reference/dbt-commands) directly within dbt Cloud. - + ## Build and view your project's docs diff --git a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md index c99b4fdc0c3..2038d4ad64c 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md @@ -10,13 +10,13 @@ The [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) is a tool fo This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the IDE landscape with ease. - + ## Basic layout The IDE streamlines your workflow, and features a popular user interface layout with files and folders on the left, editor on the right, and command and console information at the bottom. - + 1. **Git repository link —** Clicking the Git repository link, located on the upper left of the IDE, takes you to your repository on the same active branch. * **Note:** This feature is only available for GitHub or GitLab repositories on multi-tenant dbt Cloud accounts. @@ -36,7 +36,7 @@ The IDE streamlines your workflow, and features a popular user interface layout * Added (A) — The IDE detects added files * Deleted (D) — The IDE detects deleted files. - + 5. **Command bar —** The Command bar, located in the lower left of the IDE, is used to invoke [dbt commands](/reference/dbt-commands). When a command is invoked, the associated logs are shown in the Invocation History Drawer. @@ -49,7 +49,7 @@ The IDE streamlines your workflow, and features a popular user interface layout The IDE features some delightful tools and layouts to make it easier for you to write dbt code and collaborate with teammates. - + 1. **File Editor —** The File Editor is where users edit code. Tabs break out the region for each opened file, and unsaved files are marked with a blue dot icon in the tab view. @@ -61,29 +61,29 @@ The IDE features some delightful tools and layouts to make it easier for you to - **Version Control Options menu —** Below the Git Actions button, the **Changes** section, which lists all file changes since the last commit. You can click on a change to open the Git Diff View to see the inline changes. You can also right-click any file and use the file-specific options in the Version Control Options menu. - + ## Additional editing features - **Minimap —** A Minimap (code outline) gives you a high-level overview of your source code, which is useful for quick navigation and code understanding. A file's minimap is displayed on the upper-right side of the editor. To quickly jump to different sections of your file, click the shaded area. - + - **dbt Editor Command Palette —** The dbt Editor Command Palette displays text editing actions and their associated keyboard shortcuts. This can be accessed by pressing `F1` or right-clicking in the text editing area and selecting Command Palette. - + - **Git Diff View —** Clicking on a file in the **Changes** section of the **Version Control Menu** will open the changed file with Git Diff view. The editor will show the previous version on the left and the in-line changes made on the right. - + - **Markdown Preview console tab —** The Markdown Preview console tab shows a preview of your .md file's markdown code in your repository and updates it automatically as you edit your code. - + - **CSV Preview console tab —** The CSV Preview console tab displays the data from your CSV file in a table, which updates automatically as you edit the file in your seed directory. - + ## Console section The console section, located below the File editor, includes various console tabs and buttons to help you with tasks such as previewing, compiling, building, and viewing the . Refer to the following sub-bullets for more details on the console tabs and buttons. - + 1. **Preview button —** When you click on the Preview button, it runs the SQL in the active file editor regardless of whether you have saved it or not and sends the results to the **Results** console tab. You can preview a selected portion of saved or unsaved code by highlighting it and then clicking the **Preview** button. @@ -107,17 +107,17 @@ Starting from dbt v1.6 or higher, when you save changes to a model, you can comp 3. **Format button —** The editor has a **Format** button that can reformat the contents of your files. For SQL files, it uses either `sqlfmt` or `sqlfluff`, and for Python files, it uses `black`. 5. **Results tab —** The Results console tab displays the most recent Preview results in tabular format. - + 6. **Compiled Code tab —** The Compile button triggers a compile invocation that generates compiled code, which is displayed in the Compiled Code tab. - + 7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). - Double-click a node in the DAG to open that file in a new tab - Expand or shrink the DAG using node selection syntax. - Note, the `--exclude` flag isn't supported. - + ## Invocation history @@ -128,7 +128,7 @@ You can open the drawer in multiple ways: - Typing a dbt command and pressing enter - Or pressing Control-backtick (or Ctrl + `) - + 1. **Invocation History list —** The left-hand panel of the Invocation History Drawer displays a list of previous invocations in the IDE, including the command, branch name, command status, and elapsed time. @@ -138,7 +138,7 @@ You can open the drawer in multiple ways: 4. **Command Control button —** Use the Command Control button, located on the right side, to control your invocation and cancel or rerun a selected run. - + 5. **Node Summary tab —** Clicking on the Results Status Tabs will filter the Node Status List based on their corresponding status. The available statuses are Pass (successful invocation of a node), Warn (test executed with a warning), Error (database error or test failure), Skip (nodes not run due to upstream error), and Queued (nodes that have not executed yet). @@ -150,25 +150,25 @@ You can open the drawer in multiple ways: ## Modals and Menus Use menus and modals to interact with IDE and access useful options to help your development workflow. -- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. +- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. - **File Search —** You can easily search for and navigate between files using the File Navigation menu, which can be accessed by pressing Command-O or Control-O or clicking on the 🔍 icon in the File Explorer. - + - **Global Command Palette—** The Global Command Palette provides helpful shortcuts to interact with the IDE, such as git actions, specialized dbt commands, and compile, and preview actions, among others. To open the menu, use Command-P or Control-P. - + - **IDE Status modal —** The IDE Status modal shows the current error message and debug logs for the server. This also contains an option to restart the IDE. Open this by clicking on the IDE Status button. - + - **Commit Changes modal —** The Commit Changes modal is accessible via the Git Actions button to commit all changes or via the Version Control Options menu to commit individual changes. Once you enter a commit message, you can use the modal to commit and sync the selected changes. - + - **Change Branch modal —** The Change Branch modal allows users to switch git branches in the IDE. It can be accessed through the `Change Branch` link or the Git Actions button in the Version Control menu. - + - **Revert Uncommitted Changes modal —** The Revert Uncommitted Changes modal is how users revert changes in the IDE. This is accessible via the `Revert File` option above the Version Control Options menu, or via the Git Actions button when there are saved, uncommitted changes in the IDE. - + - **IDE Options menu —** The IDE Options menu can be accessed by clicking on the three-dot menu located at the bottom right corner of the IDE. This menu contains global options such as: @@ -177,4 +177,4 @@ Use menus and modals to interact with IDE and access useful options to help your * Fully recloning your repository to refresh your git state and view status details * Viewing status details, including the IDE Status modal. - + diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index e1ff64faf2b..f6f2265a922 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -26,15 +26,15 @@ By default, the IDE uses sqlfmt rules to format your code, making it convenient - + - + - + - + - + @@ -63,7 +63,7 @@ Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to the [FA - **Fix** button — Automatically fixes linting errors in the **File editor**. When fixing is complete, you'll see a message confirming the outcome. - Use the **Code Quality** tab to view and debug any code errors. - + ### Customize linting @@ -130,7 +130,7 @@ group_by_and_order_by_style = implicit For more info on styling best practices, refer to [How we style our SQL](/best-practices/how-we-style/2-how-we-style-our-sql). ::: - + ## Format @@ -158,7 +158,7 @@ To enable sqlfmt: 6. Once you've selected the **sqlfmt** radio button, go to the console section (located below the **File editor**) to select the **Format** button. 7. The **Format** button auto-formats your code in the **File editor**. Once you've auto-formatted, you'll see a message confirming the outcome. - + ### Format YAML, Markdown, JSON @@ -169,7 +169,7 @@ To format your YAML, Markdown, or JSON code, dbt Cloud integrates with [Prettier 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. Use the **Code Quality** tab to view code errors. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + You can add a configuration file to customize formatting rules for YAML, Markdown, or JSON files using Prettier. The IDE looks for the configuration file based on an order of precedence. For example, it first checks for a "prettier" key in your `package.json` file. @@ -185,7 +185,7 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + ## FAQs diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index bbb2cff8b29..42028bf993b 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -16,11 +16,11 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: 1. Click the gear icon at the top right and select **Profile settings**. 2. Click **Linked Accounts**. 3. Next to Azure DevOps, click **Link**. - + 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. - + You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index 715f23912e5..ff0f2fff18f 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -30,13 +30,13 @@ To connect your dbt Cloud account to your GitHub account: 2. Select **Linked Accounts** from the left menu. - + 3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. 4. Select the GitHub organization and repositories dbt Cloud should access. - + 5. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata @@ -52,7 +52,7 @@ To connect your dbt Cloud account to your GitHub account: ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. - + ## Personally authenticate with GitHub @@ -70,7 +70,7 @@ To connect a personal GitHub account: 2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". 3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. - + 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index 316e6af0135..e55552e2d86 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -22,11 +22,11 @@ To connect your GitLab account: 2. Select **Linked Accounts** in the left menu. 3. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and you'll see that your account has been linked to your profile. @@ -52,7 +52,7 @@ For more detail, GitLab has a [guide for creating a Group Application](https://d In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. - + In GitLab, when creating your Group Application, input the following: @@ -67,7 +67,7 @@ Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cl The application form in GitLab should look as follows when completed: - + Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. @@ -76,7 +76,7 @@ If you're a Business Critical customer using [IP restrictions](/docs/cloud/secur ### Adding the GitLab OAuth application to dbt Cloud After you've created your GitLab application, you need to provide dbt Cloud information about the app. In dbt Cloud, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. - + In dbt Cloud, input the following values: @@ -92,7 +92,7 @@ Once the form is complete in dbt Cloud, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). @@ -103,7 +103,7 @@ To connect a personal GitLab account, dbt Cloud developers should navigate to Yo If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. - + Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 2ccaba1ec4d..83846bb1f0b 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -37,7 +37,7 @@ If you use GitHub, you can import your repo directly using [dbt Cloud's GitHub A - After adding this key, dbt Cloud will be able to read and write files in your dbt project. - Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) - + ## GitLab @@ -52,7 +52,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. - Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) - + ## BitBucket @@ -60,7 +60,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - Next, click the **Add key** button and paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your BitBucket repository. - + ## AWS CodeCommit @@ -109,17 +109,17 @@ If you use Azure DevOps and you are on the dbt Cloud Enterprise plan, you can im 2. We recommend using a dedicated service user for the integration to ensure that dbt Cloud's connection to Azure DevOps is not interrupted by changes to user permissions. - + 3. Next, click the **+ New Key** button to create a new SSH key for the repository. - + 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt Cloud for your repository. 5. After saving this SSH key, dbt Cloud will be able to read and write files in your Azure DevOps repository. - + ## Other git providers diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index b24ec577935..843371be6ea 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -34,11 +34,11 @@ Many customers ask why they need to select Multitenant instead of Single tenant, 6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 7. Click **Register**. - + Here's what your app should look like before registering it: - + ## Add permissions to your new app @@ -51,7 +51,7 @@ Provide your new app access to Azure DevOps: 4. Select **Azure DevOps**. 5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - + ## Add another redirect URI @@ -63,7 +63,7 @@ You also need to add another redirect URI to your Azure AD application. This red `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. - + @@ -77,7 +77,7 @@ If you have already connected your Azure DevOps account to Active Directory, the 4. Select the directory you want to connect. 5. Click **Connect**. - + ## Add your Azure AD app to dbt Cloud @@ -91,7 +91,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo - **Application (client) ID:** Found in the Azure AD App. - **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation. - **Directory(tenant) ID:** Found in the Azure AD App. - + Your Azure AD app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). @@ -345,7 +345,7 @@ To connect the service user: 2. The admin should click **Link Azure Service User** in dbt Cloud. 3. The admin will be directed to Azure DevOps and must accept the Azure AD app's permissions. 4. Finally, the admin will be redirected to dbt Cloud, and the service user will be connected. - + Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index 7170ee95ebd..774400529e9 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -20,7 +20,7 @@ The dbt Cloud audit log stores all the events that occurred in your organization To access the audit log, click the gear icon in the top right, then click **Audit Log**. - + ## Understanding the audit log @@ -160,7 +160,7 @@ The audit log supports various events for different objects in dbt Cloud. You wi You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. - + ## Exporting logs @@ -171,7 +171,7 @@ You can use the audit log to export all historical audit results for security, c - **For events beyond 90 days** — Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization. - + ### Azure Single-tenant diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index 610c97e8b74..a40bb006d06 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -17,11 +17,11 @@ If you have not yet configured SSO in dbt Cloud, refer instead to our setup guid The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. - + Alternatively, you can start the process from the **Settings** page in the **Single Sign-on** pane. Click the **Begin Migration** button to start. - + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). @@ -48,15 +48,15 @@ Below are sample steps to update. You must complete all of them to ensure uninte Here is an example of an updated SAML 2.0 setup in Okta. - + 2. Save the configuration, and your SAML settings will look something like this: - + 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ - + 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. @@ -68,17 +68,17 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - + 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - + 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. - + 4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. @@ -88,7 +88,7 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + ## Azure Active Directory @@ -98,15 +98,15 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Click **App Registrations** on the left side menu. - + 2. Select the proper **dbt Cloud** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** - + 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. - + 4. Navigate to the dbt Cloud environment and open the **Account Settings**. Click the **Single Sign-on** option from the left side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Azure AD. Toggle the **Enable New SSO Authentication** option and **Save**. _Once this option is enabled, it cannot be undone._ @@ -116,4 +116,4 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index adf1ff208cc..63786f40bd8 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -45,7 +45,7 @@ If you're on an Enterprise plan and have the correct [permissions](/docs/cloud/m - To add a user, go to **Account Settings**, select **Users** under **Teams**. Select [**Invite Users**](docs/cloud/manage-access/invite-users). For fine-grained permission configuration, refer to [Role based access control](/docs/cloud/manage-access/enterprise-permissions). - + @@ -66,14 +66,14 @@ To add a user in dbt Cloud, you must be an account owner or have admin privilege 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Billing**. 3. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. Leaving these blank won't allow you to save your changes. 4. Press **Update Payment Information** to save your changes. - + Now that you've updated your billing, you can now [invite users](/docs/cloud/manage-access/invite-users) to join your dbt Cloud account: @@ -87,13 +87,13 @@ To delete a user in dbt Cloud, you must be an account owner or have admin privil 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. - + If you are on a **Teams** plan and you're deleting users to reduce the number of billable seats, follow these steps to lower the license count to avoid being overcharged: @@ -102,7 +102,7 @@ If you are on a **Teams** plan and you're deleting users to reduce the number of 2. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. - + Great work! After completing these steps, your dbt Cloud user count and billing count should now be the same. @@ -130,7 +130,7 @@ to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. - ### Mapped configuration @@ -149,7 +149,7 @@ license. To assign Read-Only licenses to certain groups of users, create a new License Mapping for the Read-Only license type and include a comma separated list of IdP group names that should receive a Read-Only license at sign-in time. - Usage notes: diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index ac2d6258819..dcacda20deb 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -28,11 +28,11 @@ Role-Based Access Control (RBAC) is helpful for automatically assigning permissi 1. Click the gear icon to the top right and select **Account Settings**. From the **Team** section, click **Groups** - + 1. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. 2. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. 3. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - + 4. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index f79daebf45e..21be7010a30 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -20,11 +20,11 @@ You must have proper permissions to invite new users: 1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. 2. From the left sidebar, select **Users**. - + 3. Click on **Invite Users**. - + 4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. @@ -40,7 +40,7 @@ dbt Cloud generates and sends emails from `support@getdbt.com` to the specified The email contains a link to create an account. When the user clicks on this they will be brought to one of two screens depending on whether SSO is configured or not. - + @@ -48,7 +48,7 @@ The email contains a link to create an account. When the user clicks on this the The default settings send the email, the user clicks the link, and is prompted to create their account: - + @@ -56,7 +56,7 @@ The default settings send the email, the user clicks the link, and is prompted t If SSO is configured for the environment, the user clicks the link, is brought to a confirmation screen, and presented with a link to authenticate against the company's identity provider: - + @@ -73,4 +73,4 @@ Once the user completes this process, their email and user information will popu * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index b0930af16f7..87018b14d56 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -28,7 +28,7 @@ To get started, you need to create a client ID and secret for [authentication](h In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: - + On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. @@ -46,7 +46,7 @@ Fill in the application, replacing `YOUR_ACCESS_URL` with the [appropriate Acces Then click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. - + @@ -59,7 +59,7 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client - add the client ID and secret from the BigQuery OAuth app under the **OAuth2.0 Settings** section - + ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: @@ -68,10 +68,10 @@ Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud us - Select **Credentials**. - choose your project from the list - select **Authenticate BigQuery Account** - + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. - + Select **Allow**. This redirects you back to dbt Cloud. You should now be an authenticated BigQuery user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index 8dcbb42ffa7..679133b7844 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -60,7 +60,7 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie - select **Connection** to edit the connection details - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section - + ### Authenticating to Databricks (dbt Cloud IDE developer) @@ -72,6 +72,6 @@ Once the Databricks connection via OAuth is set up for a dbt Cloud project, each - Select `OAuth` as the authentication method, and click **Save** - Finalize by clicking the **Connect Databricks Account** button - + You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt Cloud. You should now be an authenticated Databricks user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index 8e38a60dd27..5b9abb6058a 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -68,7 +68,7 @@ from Enter the Client ID and Client Secret into dbt Cloud to complete the creation of your Connection. - + ### Authorize Developer Credentials @@ -76,7 +76,7 @@ Once Snowflake SSO is enabled, users on the project will be able to configure th ### SSO OAuth Flow Diagram - + Once a user has authorized dbt Cloud with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt Cloud application. dbt Cloud is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the dbt Cloud IDE on behalf of users. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 1e45de190f5..19779baf615 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -52,7 +52,7 @@ Client Secret for use in dbt Cloud. | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | - + 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. @@ -65,7 +65,7 @@ Client Secret for use in dbt Cloud. | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | - + 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! @@ -77,7 +77,7 @@ Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. - + ## Configuration in dbt Cloud @@ -99,7 +99,7 @@ Settings. Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. The `LOGIN-SLUG` must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. - + 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt Cloud with your work email address. If authentication is successful, you will be @@ -109,7 +109,7 @@ Settings. you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. - + If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index 3bb3f7165a3..ba925fa2c24 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -97,15 +97,23 @@ You can use the instructions in this section to configure Okta as your identity 1. Log into your Okta account. Using the Admin dashboard, create a new app. - + -1. Select the following configurations: +2. Select the following configurations: - **Platform**: Web - **Sign on method**: SAML 2.0 -2. Click **Create** to continue the setup process. +3. Click **Create** to continue the setup process. - + ### Configure the Okta application @@ -123,7 +131,11 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un 2. Click **Next** to continue. - + ### Configure SAML Settings @@ -133,12 +145,12 @@ Login slugs must be unique across all dbt Cloud accounts, so pick a slug that un * **Audience URI (SP Entity ID)**: `urn:auth0::` * **Relay State**: `` - + -1. Map your organization's Okta User and Group Attributes to the format that +2. Map your organization's Okta User and Group Attributes to the format that dbt Cloud expects by using the Attribute Statements and Group Attribute Statements forms. -1. The following table illustrates expected User Attribute Statements: +3. The following table illustrates expected User Attribute Statements: | Name | Name format | Value | Description | | -------------- | ----------- | -------------------- | -------------------------- | @@ -146,7 +158,7 @@ dbt Cloud expects by using the Attribute Statements and Group Attribute Statemen | `first_name` | Unspecified | `user.firstName` | _The user's first name_ | | `last_name` | Unspecified | `user.lastName` | _The user's last name_ | -2. The following table illustrates expected **Group Attribute Statements**: +4. The following table illustrates expected **Group Attribute Statements**: | Name | Name format | Filter | Value | Description | | -------- | ----------- | ------------- | ----- | ------------------------------------- | @@ -160,9 +172,13 @@ only returns 100 groups for each user, so if your users belong to more than 100 IdP groups, you will need to use a more restrictive filter**. Please contact support if you have any questions. - + -1. Click **Next** to continue. +5. Click **Next** to continue. ### Finish Okta setup @@ -171,7 +187,11 @@ support if you have any questions. 3. Click **Finish** to finish setting up the app. - + ### View setup instructions @@ -179,11 +199,19 @@ app. 2. In the steps below, you'll supply these values in your dbt Cloud Account Settings to complete the integration between Okta and dbt Cloud. - + - + -1. After creating the Okta application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) +3. After creating the Okta application, follow the instructions in the [dbt Cloud Setup](#dbt-cloud-setup) section to complete the integration. ## Google integration @@ -398,11 +426,11 @@ To complete setup, follow the steps below in dbt Cloud: | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | Slug | Enter your desired login slug. | + - - -1. Click **Save** to complete setup for the SAML 2.0 integration. -2. After completing the setup, you can navigate to the URL generated for your account's _slug_ to test logging in with your identity provider. Additionally, users added the the SAML 2.0 app will be able to log in to dbt Cloud from the IdP directly. +4. Click **Save** to complete setup for the SAML 2.0 integration. +5. After completing the setup, you can navigate to the URL generated for your account's _slug_ to test logging in with your identity provider. Additionally, users added the the SAML 2.0 app will be able to log in to dbt Cloud from the IdP directly. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index 938587d59b3..b4954955c8c 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -24,7 +24,7 @@ Once you configure SSO, even partially, you cannot disable or revert it. When yo The diagram below explains the basic process by which users are provisioned in dbt Cloud upon logging in with SSO. - + #### Diagram Explanation diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index a0206ca038d..034b3a6c144 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -71,6 +71,6 @@ Once you are done adding all your ranges, IP restrictions can be enabled by sele Once enabled, when someone attempts to access dbt Cloud from a restricted IP, they will encounter one of the following messages depending on whether they use email & password or SSO login. - + - + diff --git a/website/docs/docs/cloud/secure/postgres-privatelink.md b/website/docs/docs/cloud/secure/postgres-privatelink.md index 95749bf913b..ef07d15c128 100644 --- a/website/docs/docs/cloud/secure/postgres-privatelink.md +++ b/website/docs/docs/cloud/secure/postgres-privatelink.md @@ -49,13 +49,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index da5312876fb..c42c703556b 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -23,17 +23,17 @@ While Redshift Serverless does support Redshift-managed type VPC endpoints, this 1. On the running Redshift cluster, select the **Properties** tab. - + 2. In the **Granted accounts** section, click **Grant access**. - + 3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. - + 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): @@ -62,14 +62,14 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Standard Redshift** - Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. - + - There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). - **Redshift Serverless** - To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. - + - From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. @@ -85,13 +85,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index bc8f30a5566..dd046259e4e 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -27,7 +27,7 @@ Users connecting to Snowflake using SSO over a PrivateLink connection from dbt C - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. - + 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. diff --git a/website/docs/docs/cloud/secure/vcs-privatelink.md b/website/docs/docs/cloud/secure/vcs-privatelink.md index 3007626190a..13bb97dd6cd 100644 --- a/website/docs/docs/cloud/secure/vcs-privatelink.md +++ b/website/docs/docs/cloud/secure/vcs-privatelink.md @@ -15,7 +15,7 @@ You will learn, at a high level, the resources necessary to implement this solut ## PrivateLink connection overview - + ### Required resources for creating a connection @@ -56,7 +56,7 @@ To complete the connection, dbt Labs must now provision a VPC Endpoint to connec - VPC Endpoint Service name: - + - **DNS configuration:** If the connection to the VCS service requires a custom domain and/or URL for TLS, a private hosted zone can be configured by the dbt Labs Infrastructure team in the dbt Cloud private network. For example: - **Private hosted zone:** `examplecorp.com` @@ -66,7 +66,7 @@ To complete the connection, dbt Labs must now provision a VPC Endpoint to connec When you have been notified that the resources are provisioned within the dbt Cloud environment, you must accept the endpoint connection (unless the VPC Endpoint Service is set to auto-accept connection requests). Requests can be accepted through the AWS console, as seen below, or through the AWS CLI. - + Once you accept the endpoint connection request, you can use the PrivateLink endpoint in dbt Cloud. @@ -77,6 +77,6 @@ Once dbt confirms that the PrivateLink integration is complete, you can use it i 2. Select the configured endpoint from the drop down list. 3. Click **Save**. - + - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index b183735da76..e104ea8640c 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -52,7 +52,7 @@ You configure project documentation to generate documentation when the job you s To generate documentation in the dbt Cloud IDE, run the `dbt docs generate` command in the Command Bar in the dbt Cloud IDE. This command will generate the Docs for your dbt project as it exists in development in your IDE session. - + After generating your documentation, you can click the **Book** icon above the file tree, to see the latest version of your documentation rendered in a new browser window. @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index b6636a84eee..1a989806851 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -29,7 +29,7 @@ Importantly, dbt also provides a way to add **descriptions** to models, columns, Here's an example docs site: - + ## Adding descriptions to your project To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: @@ -177,17 +177,17 @@ up to page views and sessions. ## Navigating the documentation site Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this: - + Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring. - + In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](/reference/node-selection/syntax). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. - + ## Deploying the documentation site diff --git a/website/docs/docs/collaborate/explore-multiple-projects.md b/website/docs/docs/collaborate/explore-multiple-projects.md index 9fd4be3bfae..3be35110a37 100644 --- a/website/docs/docs/collaborate/explore-multiple-projects.md +++ b/website/docs/docs/collaborate/explore-multiple-projects.md @@ -11,12 +11,12 @@ The resource-level lineage graph for a given project displays the cross-project When you view an upstream (parent) project, its public models display a counter icon in the upper right corner indicating how many downstream (child) projects depend on them. Selecting a model reveals the lineage indicating the projects dependent on that model. These counts include all projects listing the upstream one as a dependency in its `dependencies.yml`, even without a direct `{{ ref() }}`. Selecting a project node from a public model opens its detailed lineage graph, which is subject to your [permission](/docs/cloud/manage-access/enterprise-permissions). - + When viewing a downstream (child) project that imports and refs public models from upstream (parent) projects, public models will show up in the lineage graph and display an icon on the graph edge that indicates what the relationship is to a model from another project. Hovering over this icon indicates the specific dbt Cloud project that produces that model. Double-clicking on a model from another project opens the resource-level lineage graph of the parent project, which is subject to your permissions. - + ## Explore the project-level lineage graph diff --git a/website/docs/docs/collaborate/git/managed-repository.md b/website/docs/docs/collaborate/git/managed-repository.md index 6112b84d4c6..db8e9840ccd 100644 --- a/website/docs/docs/collaborate/git/managed-repository.md +++ b/website/docs/docs/collaborate/git/managed-repository.md @@ -13,7 +13,7 @@ To set up a project with a managed repository: 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. - + dbt Cloud will host and manage this repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt Cloud at any time. diff --git a/website/docs/docs/collaborate/git/merge-conflicts.md b/website/docs/docs/collaborate/git/merge-conflicts.md index 133a096da9c..c3c19b1e2a1 100644 --- a/website/docs/docs/collaborate/git/merge-conflicts.md +++ b/website/docs/docs/collaborate/git/merge-conflicts.md @@ -35,9 +35,9 @@ The dbt Cloud IDE will display: - The file name colored in red in the **Changes** section, with a warning icon. - If you press commit without resolving the conflict, the dbt Cloud IDE will prompt a pop up box with a list which files need to be resolved. - + - + ## Resolve merge conflicts @@ -51,7 +51,7 @@ You can seamlessly resolve merge conflicts that involve competing line changes i 6. Repeat this process for every file that has a merge conflict. - + :::info Edit conflict files - If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
@@ -67,6 +67,6 @@ When you've resolved all the merge conflicts, the last step would be to commit t 3. The dbt Cloud IDE will return to its normal state and you can continue developing! - + - + diff --git a/website/docs/docs/collaborate/git/pr-template.md b/website/docs/docs/collaborate/git/pr-template.md index b85aa8a0d51..ddb4948dad9 100644 --- a/website/docs/docs/collaborate/git/pr-template.md +++ b/website/docs/docs/collaborate/git/pr-template.md @@ -9,7 +9,7 @@ open a new Pull Request for the code changes. To enable this functionality, ensu that a PR Template URL is configured in the Repository details page in your Account Settings. If this setting is blank, the IDE will prompt users to merge the changes directly into their default branch. - + ### PR Template URL by git provider diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index f4dcb5970dd..7ef675b4e1e 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -23,11 +23,11 @@ You can pinpoint areas for performance enhancement by using the Performance over Each data point links to individual models in Explorer. - + You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/collaborate/project-recommendations.md b/website/docs/docs/collaborate/project-recommendations.md index 97585f0cb98..e6263a875fc 100644 --- a/website/docs/docs/collaborate/project-recommendations.md +++ b/website/docs/docs/collaborate/project-recommendations.md @@ -21,7 +21,7 @@ The Recommendations overview page includes two top-level metrics measuring the t - **Model test coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them. - **Model documentation coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with a description. - + ## List of rules @@ -45,6 +45,6 @@ The Recommendations overview page includes two top-level metrics measuring the t Models, sources and exposures each also have a Recommendations tab on their resource details page, with the specific recommendations that correspond to that resource: - + diff --git a/website/docs/docs/dbt-cloud-apis/discovery-api.md b/website/docs/docs/dbt-cloud-apis/discovery-api.md index 983674faedf..747128cf7bc 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-api.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-api.md @@ -9,7 +9,7 @@ By leveraging the metadata in dbt Cloud, you can create systems for data monitor You can access the Discovery API through [ad hoc queries](/docs/dbt-cloud-apis/discovery-querying), custom applications, a wide range of [partner ecosystem integrations](https://www.getdbt.com/product/integrations/) (like BI/analytics, catalog and governance, and quality and observability), and by using dbt Cloud features like [model timing](/docs/deploy/run-visibility#model-timing) and [dashboard status tiles](/docs/deploy/dashboard-status-tiles). - + You can query the dbt Cloud metadata: @@ -36,7 +36,7 @@ Use the API to look at historical information like model build time to determine You can use, for example, the [model timing](/docs/deploy/run-visibility#model-timing) tab to help identify and optimize bottlenecks in model builds: - + @@ -54,7 +54,7 @@ Use the API to find and understand dbt assets in integrated tools using informat Data producers must manage and organize data for stakeholders, while data consumers need to quickly and confidently analyze data on a large scale to make informed decisions that improve business outcomes and reduce organizational overhead. The API is useful for discovery data experiences in catalogs, analytics, apps, and machine learning (ML) tools. It can help you understand the origin and meaning of datasets for your analysis. - + @@ -68,7 +68,7 @@ Use the API to review who developed the models and who uses them to help establi Use the API to review dataset changes and uses by examining exposures, lineage, and dependencies. From the investigation, you can learn how to define and build more effective dbt projects. For more details, refer to [Development](/docs/dbt-cloud-apis/discovery-use-cases-and-examples#development). - + diff --git a/website/docs/docs/dbt-cloud-apis/discovery-querying.md b/website/docs/docs/dbt-cloud-apis/discovery-querying.md index 4e9c9cf051c..35c092adb4b 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-querying.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-querying.md @@ -92,14 +92,14 @@ Refer to the [Apollo explorer documentation](https://www.apollographql.com/docs/
- + 1. Run your query by clicking the blue query button in the top right of the **Operation** editor (to the right of the query). You should see a successful query response on the right side of the explorer. - + ### Fragments diff --git a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md index c4ddb3fbc5f..8efb1ec0d37 100644 --- a/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md +++ b/website/docs/docs/dbt-cloud-apis/discovery-use-cases-and-examples.md @@ -25,7 +25,7 @@ For performance use cases, people typically query the historical or latest appli It’s helpful to understand how long it takes to build models (tables) and tests to execute during a dbt run. Longer model build times result in higher infrastructure costs and fresh data arriving later to stakeholders. Analyses like these can be in observability tools or ad-hoc queries, like in a notebook. - +
Example query with code @@ -158,10 +158,10 @@ plt.show() Plotting examples: - + - +
@@ -687,7 +687,7 @@ query ($environmentId: BigInt!, $first: Int!) { Lineage, enabled by the `ref` function, is at the core of dbt. Understanding lineage provides many benefits, such as understanding the structure and relationships of datasets (and metrics) and performing impact-and-root-cause analyses to resolve or present issues given changes to definitions or source data. With the Discovery API, you can construct lineage using the `parents` nodes or its `children` and query the entire upstream lineage using `ancestors`. - +
Example query with code @@ -1056,7 +1056,7 @@ For development use cases, people typically query the historical or latest defin ### How is this model or metric used in downstream tools? [Exposures](/docs/build/exposures) provide a method to define how a model or metric is actually used in dashboards and other analytics tools and use cases. You can query an exposure’s definition to see how project nodes are used and query its upstream lineage results to understand the state of the data used in it, which powers use cases like a freshness and quality status tile. - +
diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index a5a8a6c4807..b0b5fbd6cfe 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -110,7 +110,7 @@ On July 18, 2023, dbt Labs made critical infrastructure changes to service accou To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is _on or before_ July 18, 2023. - + 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt Cloud by clicking the **trash can icon**. _Only take this action after the new token is in place to avoid service disruptions_. diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index 5734f8ba35a..77e536b12a5 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -14,7 +14,7 @@ permissions of the user the that they were created for. You can find your User API token in the Profile page under the `API Access` label. - + ## FAQs diff --git a/website/docs/docs/dbt-cloud-environments.md b/website/docs/docs/dbt-cloud-environments.md index 01d24fec9b9..522a354be97 100644 --- a/website/docs/docs/dbt-cloud-environments.md +++ b/website/docs/docs/dbt-cloud-environments.md @@ -38,7 +38,7 @@ To create a new dbt Cloud development environment: To use the dbt Cloud IDE or dbt Cloud CLI, each developer will need to set up [personal development credentials](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#access-the-cloud-ide) to your warehouse connection in their **Profile Settings**. This allows you to set separate target information and maintain individual credentials to connect to your warehouse. - + ## Deployment environment diff --git a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md index 57cd3cc37d3..c0236a30783 100644 --- a/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md +++ b/website/docs/docs/dbt-versions/release-notes/73-Jan-2024/partial-parsing.md @@ -11,5 +11,5 @@ By default, dbt parses all the files in your project at the beginning of every d To learn more, refer to [Partial parsing](/docs/deploy/deploy-environments#partial-parsing). - + diff --git a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md index 80bff71d176..25791b66fb1 100644 --- a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md +++ b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md @@ -13,4 +13,4 @@ To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#exten The **Extended Atrributes** text box is available from your environment's settings page: - + diff --git a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md index b7b0f0f5325..eff15e96cfd 100644 --- a/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md +++ b/website/docs/docs/dbt-versions/release-notes/75-Nov-2023/repo-caching.md @@ -11,4 +11,4 @@ Now available for dbt Cloud Enterprise plans is a new option to enable Git repos To learn more, refer to [Repo caching](/docs/deploy/deploy-environments#git-repository-caching). - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md index f4226627792..20e56879940 100644 --- a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/native-retry-support-rn.md @@ -12,4 +12,4 @@ Previously in dbt Cloud, you could only rerun an errored job from start but now You can view which job failed to complete successully, which command failed in the run step, and choose how to rerun it. To learn more, refer to [Retry jobs](/docs/deploy/retry-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md index 0e56c665ac2..a1b59aa6ec1 100644 --- a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md +++ b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md @@ -20,7 +20,7 @@ It aims to bring the best of modeling and semantics to downstream applications b - dbt Cloud [multi-tenant regional](/docs/cloud/about-cloud/regions-ip-addresses) support for North America, EMEA, and APAC. Single-tenant support coming soon. - Use the APIs to call an export (a way to build tables in your data platform), then access them in your preferred BI tool. Starting from dbt v1.7 or higher, you will be able to schedule exports as part of your dbt job. - + The dbt Semantic Layer is available to [dbt Cloud Team or Enterprise](https://www.getdbt.com/) multi-tenant plans on dbt v1.6 or higher. - Team and Enterprise customers can use 1,000 Queried Metrics per month for no additional cost on a limited trial basis, subject to reasonable use limitations. Refer to [Billing](/docs/cloud/billing#what-counts-as-a-queried-metric) for more information. diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md index 434d24edcbf..a8ae1ade65b 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/ci-updates-phase2-rn.md @@ -11,7 +11,7 @@ dbt Cloud now has two distinct job types: [deploy jobs](/docs/deploy/deploy-jobs With two types of jobs, instead of one generic type, we can better guide you through the setup flow. Best practices are built into the default settings so you can go from curious to being set up in seconds. - + And, we now have more efficient state comparisons on CI checks: never waste a build or test on code that hasn’t been changed. We now diff between the Git pull request (PR) code and what’s running in production more efficiently with the introduction of deferral to an environment versus a job. To learn more, refer to [Continuous integration in dbt Cloud](/docs/deploy/continuous-integration). @@ -39,4 +39,4 @@ Below is a comparison table that describes how deploy jobs and CI jobs behave di To check for the job type, review your CI jobs in dbt Cloud's [Run History](/docs/deploy/run-visibility#run-history) and check for the **CI Job** tag below the job name. If it doesn't have this tag, it was misclassified and you need to re-create the job. - + diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md index dc2cdb63748..0b588376c34 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md @@ -12,4 +12,4 @@ Previously, when dbt Labs released a new [version](/docs/dbt-versions/core#how-d To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md index b1d152fd91e..5cf1f97ff25 100644 --- a/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md +++ b/website/docs/docs/dbt-versions/release-notes/79-July-2023/faster-run.md @@ -15,11 +15,11 @@ Read more on how you can experience faster run start execution and how enterpris The Scheduler takes care of preparing each dbt Cloud job to run in your cloud data platform. This [prep](/docs/deploy/job-scheduler#scheduler-queue) involves readying a Kubernetes pod with the right version of dbt installed, setting environment variables, loading data platform credentials, and git provider authorization, amongst other environment-setting tasks. Only after the environment is set up, can dbt execution begin. We display this time to the user in dbt Cloud as “prep time”. - + For all its strengths, Kubernetes has challenges, especially with pod management impacting run execution time. We’ve rebuilt our scheduler by ensuring faster job execution with a ready pool of pods to execute customers’ jobs. This means you won't experience long prep times at the top of the hour, and we’re determined to keep runs starting near instantaneously. Don’t just take our word, review the data yourself. - + Jobs scheduled at the top of the hour used to take over 106 seconds to prepare because of the volume of runs the scheduler has to process. Now, even with increased runs, we have reduced prep time to 27 secs (at a maximum) — a 75% speed improvement for runs at peak traffic times! diff --git a/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md b/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md index 35a202cf3ea..e99d1fe3e0b 100644 --- a/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md +++ b/website/docs/docs/dbt-versions/release-notes/80-June-2023/lint-format-rn.md @@ -17,10 +17,10 @@ For more info, read [Lint and format your code](/docs/cloud/dbt-cloud-ide/lint-f - + - + - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md index 38b017baa30..1aabe517076 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md @@ -16,4 +16,4 @@ Highlights include: - Cleaner look and feel with iconography - Helpful tool tips - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md index 86ca532c154..050fd8339a2 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-endpoint.md @@ -14,7 +14,7 @@ dbt Labs is making a change to the metadata retrieval policy for Run History in Specifically, all `GET` requests to the dbt Cloud [Runs endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#tag/Runs) will return information on runs, artifacts, logs, and run steps only for the past 365 days. Additionally, the run history displayed in the dbt Cloud UI will only show runs for the past 365 days. - + We will retain older run history in cold storage and can make it available to customers who reach out to our Support team. To request older run history info, contact the Support team at [support@getdbt.com](mailto:support@getdbt.com) or use the dbt Cloud application chat by clicking the `?` icon in the dbt Cloud UI. diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md index 0bc4b76d0fc..d4d299b1d36 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md @@ -8,7 +8,7 @@ tags: [May-2023, Scheduler] New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. - + Highlights include: diff --git a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md index 9ceda7749cd..bdc89b4abde 100644 --- a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md +++ b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md @@ -10,5 +10,5 @@ To help save compute time, new jobs will no longer be triggered to run by defaul For more information, refer to [Deploy jobs](/docs/deploy/deploy-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md index 41e1a5265ca..2d0488d4488 100644 --- a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md +++ b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md @@ -13,4 +13,4 @@ Large DAGs can take a long time (10 or more seconds, if not minutes) to render a The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md index 90e6ac72fea..307786c6b85 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md @@ -10,4 +10,4 @@ We fixed an issue where a spotty internet connection could cause the “IDE sess We updated the health check logic so it now excludes client-side connectivity issues from the IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md index 46c1f4bbd15..9ff5986b4da 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md @@ -9,4 +9,4 @@ tags: [v1.1.46, March-02-2022] dbt Cloud now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt Cloud waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt Cloud to ready your job to run in your cloud data warehouse. - + diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index 75697d32d17..052611f66e6 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -9,7 +9,7 @@ In dbt Cloud, both jobs and environments are configured to use a specific versio Navigate to the settings page of an environment, then click **edit**. Click the **dbt Version** dropdown bar and make your selection. From this list, you can select an available version of Core to associate with this environment. - + Be sure to save your changes before navigating away. @@ -17,7 +17,7 @@ Be sure to save your changes before navigating away. Each job in dbt Cloud can be configured to inherit parameters from the environment it belongs to. - + The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. @@ -275,7 +275,7 @@ Once you have your project compiling and running on the latest version of dbt in - + Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/artifacts.md b/website/docs/docs/deploy/artifacts.md index 7ecc05355a0..9b3ae71e79c 100644 --- a/website/docs/docs/deploy/artifacts.md +++ b/website/docs/docs/deploy/artifacts.md @@ -10,11 +10,11 @@ When running dbt jobs, dbt Cloud generates and saves *artifacts*. You can use th While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection in the **Project details** page. To access this page, click the gear icon in the upper right, select **Account Settings**, select your project, and click **Edit** in the lower right. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. - + If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. - + When you add a production job to a project, dbt Cloud updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. @@ -25,10 +25,10 @@ When set up, dbt Cloud updates the **Documentation** link in the header tab so i Note that both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation for the project-level documentation to be populated or updated. - + ### Source Freshness As with Documentation, configuring a job for the Source Freshness artifact setting also updates the Data Sources link under **Deploy**. The new link points to the latest Source Freshness report for the selected job. - + diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index fd4da3379b7..149a6951fdc 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -39,7 +39,7 @@ To make CI job creation easier, many options on the **CI job** page are set to d - **Generate docs on run** — Enable this option if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this job runs. This option is disabled by default since most teams do not want to test doc generation on every CI check. - + 4. (optional) Options in the **Advanced Settings** section: - **Environment Variables** — Define [environment variables](/docs/build/environment-variables) to customize the behavior of your project when this CI job runs. You can specify that a CI job is running in a _Staging_ or _CI_ environment by setting an environment variable and modifying your project code to behave differently, depending on the context. It's common for teams to process only a subset of data for CI runs, using environment variables to branch logic in their dbt project code. @@ -49,7 +49,7 @@ To make CI job creation easier, many options on the **CI job** page are set to d - **Threads** — By default, it’s set to 4 [threads](/docs/core/connect-data-platform/connection-profiles#understanding-threads). Increase the thread count to increase model execution concurrency. - **Run source freshness** — Enable this option to invoke the `dbt source freshness` command before running this CI job. Refer to [Source freshness](/docs/deploy/source-freshness) for more details. - + ## Trigger a CI job with the API @@ -77,15 +77,15 @@ The green checkmark means the dbt build and tests were successful. Clicking on t ### GitHub pull request example - + ### GitLab pull request example - + ### Azure DevOps pull request example - + ## Troubleshooting @@ -117,10 +117,10 @@ If you're experiencing any issues, review some of the common questions and answe First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- +

Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- +

Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job.
diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index 3fe50922bfd..0f87965aada 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -6,7 +6,7 @@ description: "You can set up continuous integration (CI) checks to test every si To implement a continuous integration (CI) workflow in dbt Cloud, you can set up automation that tests code changes by running [CI jobs](/docs/deploy/ci-jobs) before merging to production. dbt Cloud tracks the state of what’s running in your production environment so, when you run a CI job, only the modified data assets in your pull request (PR) and their downstream dependencies are built and tested in a staging schema. You can also view the status of the CI checks (tests) directly from within the PR; this information is posted to your Git provider as soon as a CI job completes. Additionally, you can enable settings in your Git provider that allow PRs only with successful CI checks be approved for merging. - + Using CI helps: @@ -20,7 +20,7 @@ When you [set up CI jobs](/docs/deploy/ci-jobs#set-up-ci-jobs), dbt Cloud liste dbt Cloud builds and tests the models affected by the code change in a temporary schema, unique to the PR. This process ensures that the code builds without error and that it matches the expectations as defined by the project's dbt tests. The unique schema name follows the naming convention `dbt_cloud_pr__` (for example, `dbt_cloud_pr_1862_1704`) and can be found in the run details for the given run, as shown in the following image: - + When the CI run completes, you can view the run status directly from within the pull request. dbt Cloud updates the pull request in GitHub, GitLab, or Azure DevOps with a status message indicating the results of the run. The status message states whether the models and tests ran successfully or not. @@ -48,5 +48,5 @@ Below describes the conditions when CI checks are run concurrently and when they When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the latest commit and cancels any CI run that is (now) stale and still in flight. This can happen when you’re pushing new commits while a CI build is still in process and not yet done. By cancelling runs in a safe and deliberate way, dbt Cloud helps improve productivity and reduce data platform spend on wasteful CI runs. - + diff --git a/website/docs/docs/deploy/dashboard-status-tiles.md b/website/docs/docs/deploy/dashboard-status-tiles.md index 2ba93606204..67aa1a93c33 100644 --- a/website/docs/docs/deploy/dashboard-status-tiles.md +++ b/website/docs/docs/deploy/dashboard-status-tiles.md @@ -9,11 +9,11 @@ In dbt Cloud, the [Discovery API](/docs/dbt-cloud-apis/discovery-api) can power ## Functionality The dashboard status tile looks like this: - + The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: - + Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. @@ -56,11 +56,11 @@ Note that Mode has also built its own [integration](https://mode.com/get-dbt/) w Looker does not allow you to directly embed HTML and instead requires creating a [custom visualization](https://docs.looker.com/admin-options/platform/visualizations). One way to do this for admins is to: - Add a [new visualization](https://fishtown.looker.com/admin/visualizations) on the visualization page for Looker admins. You can use [this URL](https://metadata.cloud.getdbt.com/static/looker-viz.js) to configure a Looker visualization powered by the iFrame. It will look like this: - + - Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, jobID, and token relevant to that dashboard. - + ### Tableau Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: @@ -79,7 +79,7 @@ https://metadata.cloud.getdbt.com/exposure-tile?name=&jobId= + ### Sigma @@ -99,4 +99,4 @@ https://metadata.au.dbt.com/exposure-tile?name=&jobId=&to ``` ::: - + diff --git a/website/docs/docs/deploy/deploy-environments.md b/website/docs/docs/deploy/deploy-environments.md index f9f15a25aa2..650fdb1c28a 100644 --- a/website/docs/docs/deploy/deploy-environments.md +++ b/website/docs/docs/deploy/deploy-environments.md @@ -26,13 +26,13 @@ import CloudEnvInfo from '/snippets/_cloud-environments-info.md'; To create a new dbt Cloud development environment, navigate to **Deploy** -> **Environments** and then click **Create Environment**. Select **Deployment** as the environment type. - + ### Set as production environment In dbt Cloud, each project can have one designated deployment environment, which serves as its production environment. This production environment is _essential_ for using features like dbt Explorer and cross-project references. It acts as the source of truth for the project's production state in dbt Cloud. - + ### Semantic Layer @@ -65,7 +65,7 @@ This section will not appear if you are using Redshift, as all values are inferr
- + #### Editable fields @@ -89,7 +89,7 @@ This section will not appear if you are using Spark, as all values are inferred
- + #### Editable fields @@ -108,7 +108,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -120,7 +120,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -132,7 +132,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -151,7 +151,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -161,7 +161,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields @@ -172,7 +172,7 @@ This section allows you to determine the credentials that should be used when co
- + #### Editable fields diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index 3a3dbebd70e..cee6e245359 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -38,7 +38,7 @@ You can create a deploy job and configure it to run on [scheduled days and times - **Timing** — Specify whether to [schedule](#schedule-days) the deploy job using **Frequency** that runs the job at specific times of day, **Specific Intervals** that runs the job every specified number of hours, or **Cron Schedule** that runs the job specified using [cron syntax](#custom-cron-schedule). - **Days of the Week** — By default, it’s set to every day when **Frequency** or **Specific Intervals** is chosen for **Timing**. - + 5. (optional) Options in the **Advanced Settings** section: - **Environment Variables** — Define [environment variables](/docs/build/environment-variables) to customize the behavior of your project when the deploy job runs. @@ -53,7 +53,7 @@ You can create a deploy job and configure it to run on [scheduled days and times - **dbt Version** — By default, it’s set to inherit the [dbt version](/docs/dbt-versions/core) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior. - **Threads** — By default, it’s set to 4 [threads](/docs/core/connect-data-platform/connection-profiles#understanding-threads). Increase the thread count to increase model execution concurrency. - + ### Schedule days @@ -80,7 +80,7 @@ dbt Cloud uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordi To fully customize the scheduling of your job, choose the **Custom cron schedule** option and use the cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. - + Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. diff --git a/website/docs/docs/deploy/deployment-overview.md b/website/docs/docs/deploy/deployment-overview.md index bf55420918c..29934663544 100644 --- a/website/docs/docs/deploy/deployment-overview.md +++ b/website/docs/docs/deploy/deployment-overview.md @@ -104,12 +104,12 @@ Learn how to use dbt Cloud's features to help your team ship timely and quality - + - + - + diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index 64fcb1dadae..cca2368f38a 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -19,8 +19,8 @@ If your organization is using [Airflow](https://airflow.apache.org/), there are Installing the [dbt Cloud Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt Cloud jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt Cloud. - - + + @@ -71,7 +71,7 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). - + diff --git a/website/docs/docs/deploy/job-commands.md b/website/docs/docs/deploy/job-commands.md index aa49d638e2c..26fe1931db6 100644 --- a/website/docs/docs/deploy/job-commands.md +++ b/website/docs/docs/deploy/job-commands.md @@ -29,7 +29,7 @@ Every job invocation automatically includes the [`dbt deps`](/reference/commands **Job outcome** — During a job run, the built-in commands are "chained" together. This means if one of the run steps in the chain fails, then the next commands aren't executed, and the entire job fails with an "Error" job status. - + ### Checkbox commands diff --git a/website/docs/docs/deploy/job-notifications.md b/website/docs/docs/deploy/job-notifications.md index 4166cf73da6..548e34fc2f3 100644 --- a/website/docs/docs/deploy/job-notifications.md +++ b/website/docs/docs/deploy/job-notifications.md @@ -23,7 +23,7 @@ You can receive email alerts about jobs by configuring the dbt Cloud email notif If you're an account admin, you can choose a different email address to receive notifications. Select the **Notification email** dropdown and choose another address from the list. The list includes **Internal Users** with access to the account and **External Emails** that have been added. - To add an external email address, select the **Notification email** dropdown and choose **Add external email**. After you add the external email, it becomes available for selection in the **Notification email** dropdown list. External emails can be addresses that are outside of your dbt Cloud account and also for third-party integrations like [channels in Microsoft Teams](https://support.microsoft.com/en-us/office/tip-send-email-to-a-channel-2c17dbae-acdf-4209-a761-b463bdaaa4ca) and [PagerDuty email integration](https://support.pagerduty.com/docs/email-integration-guide). - + 1. Select the **Environment** for the jobs you want to receive notifications about from the dropdown. @@ -35,7 +35,7 @@ You can receive email alerts about jobs by configuring the dbt Cloud email notif To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. - + ### Unsubscribe from email notifications 1. From the gear menu, choose **Notification settings**. @@ -75,7 +75,7 @@ Any account admin can edit the Slack notifications but they'll be limited to con To set up alerts on jobs from a different environment, select another **Environment** from the dropdown, **Edit** those job notification settings, and **Save** the changes. - + ### Disable the Slack integration diff --git a/website/docs/docs/deploy/job-scheduler.md b/website/docs/docs/deploy/job-scheduler.md index b4ba711643c..7a4cd740804 100644 --- a/website/docs/docs/deploy/job-scheduler.md +++ b/website/docs/docs/deploy/job-scheduler.md @@ -50,7 +50,7 @@ If there is an available run slot and there isn't an actively running instance o Together, **wait time** plus **prep time** is the total time a run spends in the queue (or **Time in queue**). - + ### Treatment of CI jobs When compared to deployment jobs, the scheduler behaves differently when handling [continuous integration (CI) jobs](/docs/deploy/continuous-integration). It queues a CI job to be processed when it's triggered to run by a Git pull request, and the conditions the scheduler checks to determine if the run can start executing are also different: @@ -80,7 +80,7 @@ The dbt Cloud scheduler prevents too many job runs from clogging the queue by ca The scheduler prevents queue clog by canceling runs that aren't needed, ensuring there is only one run of the job in the queue at any given time. If a newer run is queued, the scheduler cancels any previously queued run for that job and displays an error message. - + To prevent over-scheduling, users will need to take action by either refactoring the job so it runs faster or modifying its [schedule](/docs/deploy/deploy-jobs#schedule-days). diff --git a/website/docs/docs/deploy/monitor-jobs.md b/website/docs/docs/deploy/monitor-jobs.md index 98fe61b4224..45156bb341c 100644 --- a/website/docs/docs/deploy/monitor-jobs.md +++ b/website/docs/docs/deploy/monitor-jobs.md @@ -20,11 +20,11 @@ This portion of our documentation will go over dbt Cloud's various capabilities - + - + - + diff --git a/website/docs/docs/deploy/retry-jobs.md b/website/docs/docs/deploy/retry-jobs.md index db703ff6a38..beefb35379e 100644 --- a/website/docs/docs/deploy/retry-jobs.md +++ b/website/docs/docs/deploy/retry-jobs.md @@ -23,7 +23,7 @@ If your dbt job run completed with a status of **Error**, you can rerun it from If you chose to rerun from the failure point, a **Rerun failed steps** modal opens. The modal lists the run steps that will be invoked: the failed step and any skipped steps. To confirm these run steps, click **Rerun from failure**. The job reruns from the failed command in the previously failed run. A banner at the top of the **Run Summary** tab captures this with the message, "This run resumed execution from last failed step". - + ## Related content - [Retry a failed run for a job](/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job) API endpoint diff --git a/website/docs/docs/deploy/run-visibility.md b/website/docs/docs/deploy/run-visibility.md index 01e5e591b4e..ff9abfa5b0b 100644 --- a/website/docs/docs/deploy/run-visibility.md +++ b/website/docs/docs/deploy/run-visibility.md @@ -17,13 +17,13 @@ dbt Cloud developers can access their run history for the last 365 days through We limit self-service retrieval of run history metadata to 365 days to improve dbt Cloud's performance. For more info on the run history retrieval change, refer to [Older run history retrieval change](/docs/dbt-versions/release-notes/May-2023/run-history-endpoint). - + ## Access logs You can view or download in-progress and historical logs for your dbt runs. This makes it easier for the team to debug errors more efficiently. - + ## Model timing > Available on [multi-tenant](/docs/cloud/about-cloud/regions-ip-addresses) dbt Cloud accounts on the [Team or Enterprise plans](https://www.getdbt.com/pricing/). @@ -32,4 +32,4 @@ The model timing dashboard on dbt Cloud displays the composition, order, and tim You can find the dashboard on the **Run Overview** page. - + diff --git a/website/docs/docs/deploy/source-freshness.md b/website/docs/docs/deploy/source-freshness.md index 3c4866cd084..2f9fe6bc007 100644 --- a/website/docs/docs/deploy/source-freshness.md +++ b/website/docs/docs/deploy/source-freshness.md @@ -6,7 +6,7 @@ description: "Validate that data freshness meets expectations and alert if stale dbt Cloud provides a helpful interface around dbt's [source data freshness](/docs/build/sources#snapshotting-source-data-freshness) calculations. When a dbt Cloud job is configured to snapshot source data freshness, dbt Cloud will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. - + ### Enabling source freshness snapshots @@ -15,7 +15,7 @@ dbt Cloud provides a helpful interface around dbt's [source data freshness](/doc - Select the **Generate docs on run** checkbox to automatically [generate project docs](/docs/collaborate/build-and-view-your-docs#set-up-a-documentation-job). - Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. - + To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](/docs/build/sources#snapshotting-source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. @@ -27,7 +27,7 @@ Review the following options and outcomes: | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt Cloud will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | - + ### Source freshness snapshot frequency diff --git a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md index e6a50443837..f41bceab12d 100644 --- a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md +++ b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md @@ -16,11 +16,11 @@ New dbt Cloud accounts will automatically be created with a Development Environm To create a development environment, choose **Deploy** > **Environments** from the top left. Then, click **Create Environment**. - + Enter an environment **Name** that would help you identify it among your other environments (for example, `Nate's Development Environment`). Choose **Development** as the **Environment Type**. You can also select which **dbt Version** to use at this time. For compatibility reasons, we recommend that you select the same dbt version that you plan to use in your deployment environment. Finally, click **Save** to finish creating your development environment. - + ### Setting up developer credentials @@ -28,14 +28,14 @@ The IDE uses *developer credentials* to connect to your database. These develope New dbt Cloud accounts should have developer credentials created automatically as a part of Project creation in the initial application setup. - + New users on existing accounts *might not* have their development credentials already configured. To manage your development credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 2. Select the relevant project in the list. After entering your developer credentials, you'll be able to access the dbt IDE. - + ### Compiling and running SQL diff --git a/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md b/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md index 2966aebae64..459fcfc487f 100644 --- a/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md +++ b/website/docs/docs/use-dbt-semantic-layer/sl-architecture.md @@ -17,7 +17,7 @@ import DeprecationNotice from '/snippets/_sl-deprecation-notice.md'; The dbt Semantic Layer allows you to define metrics and use various interfaces to query them. The Semantic Layer does the heavy lifting to find where the queried data exists in your data platform and generates the SQL to make the request (including performing joins). - + ## Components diff --git a/website/docs/faqs/API/rotate-token.md b/website/docs/faqs/API/rotate-token.md index 0b808fa9176..4470de72d5a 100644 --- a/website/docs/faqs/API/rotate-token.md +++ b/website/docs/faqs/API/rotate-token.md @@ -19,7 +19,7 @@ To automatically rotate your API key: 2. Select **API Access** from the lefthand side. 3. In the **API** pane, click `Rotate`. - + diff --git a/website/docs/faqs/Accounts/change-users-license.md b/website/docs/faqs/Accounts/change-users-license.md index ed12ba5dc14..8755b946126 100644 --- a/website/docs/faqs/Accounts/change-users-license.md +++ b/website/docs/faqs/Accounts/change-users-license.md @@ -10,10 +10,10 @@ To change the license type for a user from `developer` to `read-only` or `IT` in 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove, and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. - + diff --git a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md index ef2ff8e4cd3..d16651a944c 100644 --- a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md +++ b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md @@ -32,7 +32,7 @@ To unlock your account and select a plan, review the following guidance per plan 3. Confirm your plan selection on the pop up message. 4. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Developer plan. 🎉 - + ### Team plan @@ -42,7 +42,7 @@ To unlock your account and select a plan, review the following guidance per plan 4. Enter your payment details and click **Save**. 5. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Team plan. 🎉 - + ### Enterprise plan @@ -50,7 +50,7 @@ To unlock your account and select a plan, review the following guidance per plan 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Cloud Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. - + 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt Cloud can help you and your team. diff --git a/website/docs/faqs/Accounts/delete-users.md b/website/docs/faqs/Accounts/delete-users.md index 6041eb93d9d..a7e422fd82c 100644 --- a/website/docs/faqs/Accounts/delete-users.md +++ b/website/docs/faqs/Accounts/delete-users.md @@ -10,20 +10,20 @@ To delete a user in dbt Cloud, you must be an account owner or have admin privil 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. - + If you are on a **Teams** plan and you are deleting users to reduce the number of billable seats, you also need to take these steps to lower the license count: 1. In **Account Settings**, select **Billing**. 2. Enter the number of developer seats you want and make sure you fill in all the payment details, including the **Billing Address** section. If you leave any field blank, you won't be able to save your changes. 3. Click **Update Payment Information** to save your changes. - + ## Related docs diff --git a/website/docs/faqs/Environments/custom-branch-settings.md b/website/docs/faqs/Environments/custom-branch-settings.md index 6ba2a719ee8..4bc4b85be02 100644 --- a/website/docs/faqs/Environments/custom-branch-settings.md +++ b/website/docs/faqs/Environments/custom-branch-settings.md @@ -28,7 +28,7 @@ For example, if you want to use the `develop` branch of a connected repository: - Enter **develop** as the name of your custom branch - Click **Save** - + ## Deployment diff --git a/website/docs/faqs/Git/git-migration.md b/website/docs/faqs/Git/git-migration.md index 454dd356285..775ae3679e3 100644 --- a/website/docs/faqs/Git/git-migration.md +++ b/website/docs/faqs/Git/git-migration.md @@ -16,7 +16,7 @@ To migrate from one git provider to another, refer to the following steps to avo 2. Go back to dbt Cloud and set up your [integration for the new git provider](/docs/cloud/git/connect-github), if needed. 3. Disconnect the old repository in dbt Cloud by going to **Account Settings** and then **Projects**. Click on the **Repository** link, then click **Edit** and **Disconnect**. - + 4. On the same page, connect to the new git provider repository by clicking **Configure Repository** - If you're using the native integration, you may need to OAuth to it. diff --git a/website/docs/faqs/Git/gitignore.md b/website/docs/faqs/Git/gitignore.md index cda3a9d75b9..6bda9611733 100644 --- a/website/docs/faqs/Git/gitignore.md +++ b/website/docs/faqs/Git/gitignore.md @@ -35,7 +35,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 4. Save the changes but _don't commit_. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. - + 6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` @@ -50,7 +50,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. - + ### Fix in the git provider @@ -144,7 +144,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 4. Save the changes but _don't commit_. 5. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right corner of the IDE screen and select **Restart IDE**. - + 6. Once the IDE restarts, go to the **File Explorer** to delete the following files or folders (if they exist). No data will be lost: * `target`, `dbt_modules`, `dbt_packages`, `logs` @@ -159,7 +159,7 @@ For more info on `gitignore` syntax, refer to the [Git docs](https://git-scm.com 12. Once the branch has changed, click the **Pull from remote** button to pull in all the changes. 13. Verify the changes by making sure the files/folders in the `.gitignore `file are in italics. - + ### Fix in the git provider diff --git a/website/docs/faqs/Project/delete-a-project.md b/website/docs/faqs/Project/delete-a-project.md index 21f16cbfaec..5fde3fee9cd 100644 --- a/website/docs/faqs/Project/delete-a-project.md +++ b/website/docs/faqs/Project/delete-a-project.md @@ -9,10 +9,10 @@ To delete a project in dbt Cloud, you must be the account owner or have admin pr 1. From dbt Cloud, click the gear icon at the top right corner and select **Account Settings**. - + 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option will appear on the left side of the same details view. 4. Select **Delete**. Confirm the action to immediately delete the user without additional password prompts. There will be no account password prompt, and the project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. - + diff --git a/website/docs/faqs/Troubleshooting/gitignore.md b/website/docs/faqs/Troubleshooting/gitignore.md index 2b668a3efb9..59fd4e8c866 100644 --- a/website/docs/faqs/Troubleshooting/gitignore.md +++ b/website/docs/faqs/Troubleshooting/gitignore.md @@ -24,7 +24,7 @@ dbt_modules/ 2. Save your changes but _don't commit_ 3. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. - + 4. Select **Restart IDE**. 5. Go back to your dbt project and delete the following files or folders if you have them: @@ -35,7 +35,7 @@ dbt_modules/ 9. Merge the PR on your git provider page. 10. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. - + @@ -53,12 +53,12 @@ dbt_modules/ 2. Go to your `dbt_project.yml` file and add `tmp/` after your `target-path:` and add `log-path: "tmp/logs"`. * So it should look like: `target-path: "tmp/target"` and `log-path: "tmp/logs"`: - + 3. Save your changes but _don't commit_. 4. Restart the IDE by clicking on the three dots next to the **IDE Status button** on the lower right of the IDE. - + 5. Select **Restart IDE**. 6. Go back to your dbt project and delete the following four folders (if you have them): @@ -71,7 +71,7 @@ dbt_modules/ * Remove `tmp` from your `target-path` and completely remove the `log-path: "tmp/logs"` line. - + 9. Restart the IDE again. 10. Delete the `tmp` folder in the **File Explorer**. @@ -79,7 +79,7 @@ dbt_modules/ 12. Merge the PR in your git provider page. 13. Switch to your main branch and click on **Pull from remote** to pull in all the changes you made to your main branch. You can verify the changes by making sure the files/folders in the .gitignore file are in italics. - + diff --git a/website/docs/guides/adapter-creation.md b/website/docs/guides/adapter-creation.md index 12bda4726f9..28e0e8253ad 100644 --- a/website/docs/guides/adapter-creation.md +++ b/website/docs/guides/adapter-creation.md @@ -107,7 +107,7 @@ A set of *materializations* and their corresponding helper macros defined in dbt Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works. - + ## Prerequisites @@ -1225,17 +1225,17 @@ This can vary substantially depending on the nature of the release but a good ba Breaking this down: - Visually distinctive announcement - make it clear this is a release - + - Short written description of what is in the release - + - Links to additional resources - + - Implementation instructions: - + - Future plans - + - Contributor recognition (if applicable) - + ## Verify a new adapter diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index d961a27018a..4f461a3cf3a 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -56,7 +56,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen Click **Run**, then check for results from the queries. For example:
- +
2. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: - **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. @@ -64,7 +64,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. - **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default).
- +
3. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. diff --git a/website/docs/guides/codespace-qs.md b/website/docs/guides/codespace-qs.md index c399eb494a9..b28b0ddaacf 100644 --- a/website/docs/guides/codespace-qs.md +++ b/website/docs/guides/codespace-qs.md @@ -35,7 +35,7 @@ dbt Labs provides a [GitHub Codespace](https://docs.github.com/en/codespaces/ove 1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: - + When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index 6c1d60c93da..1778098f752 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -144,7 +144,7 @@ In Azure: - Click *OK* and then *Save* to save the variable - Save your new Azure pipeline - + @@ -486,9 +486,9 @@ Additionally, you’ll see the job in the run history of dbt Cloud. It should be - + - + diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index 98c215382f6..cb01daec394 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -41,7 +41,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 1. Use your existing account or sign up for a Databricks account at [Try Databricks](https://databricks.com/). Complete the form with your user information.
- +
2. For the purpose of this tutorial, you will be selecting AWS as our cloud provider but if you use Azure or GCP internally, please choose one of them. The setup process will be similar. @@ -49,28 +49,28 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 4. After setting up your password, you will be guided to choose a subscription plan. Select the `Premium` or `Enterprise` plan to access the SQL Compute functionality required for using the SQL warehouse for dbt. We have chosen `Premium` for this tutorial. Click **Continue** after selecting your plan.
- +
5. Click **Get Started** when you come to this below page and then **Confirm** after you validate that you have everything needed.
- +
- +
6. Now it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, etc into one place. Provide the name of your workspace and choose the appropriate AWS region and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for the purpose of this tutorial.
- +
7. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this.
- +
:::tip @@ -79,16 +79,16 @@ If you get a session error and don’t get redirected to this page, you can go b 8. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**.
- +
- +
10. Go back to the Databricks tab. You should see that your workspace is ready to use.
- +
11. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. @@ -101,7 +101,7 @@ If you get a session error and don’t get redirected to this page, you can go b 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space.
- +
3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. @@ -109,12 +109,12 @@ If you get a session error and don’t get redirected to this page, you can go b 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu.
- +
6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI.
- +
7. Update the Table Attributes at the top: @@ -128,7 +128,7 @@ If you get a session error and don’t get redirected to this page, you can go b - LAST_NAME = string
- +
8. Click **Create** on the bottom once you’re done. @@ -136,11 +136,11 @@ If you get a session error and don’t get redirected to this page, you can go b 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`.
- +
- +
10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. @@ -153,7 +153,7 @@ If you get a session error and don’t get redirected to this page, you can go b ```
- +
12. To ensure any users who might be working on your dbt project has access to your object, run this command. diff --git a/website/docs/guides/dbt-python-snowpark.md b/website/docs/guides/dbt-python-snowpark.md index fce0ad692f6..110445344e9 100644 --- a/website/docs/guides/dbt-python-snowpark.md +++ b/website/docs/guides/dbt-python-snowpark.md @@ -51,19 +51,19 @@ Overall we are going to set up the environments, build scalable pipelines in dbt 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. - + 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: - + 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. - + - + 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. @@ -80,7 +80,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE_WH** warehouse. - + 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! @@ -233,7 +233,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. - + 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: - Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. @@ -244,7 +244,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. - + 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. @@ -256,7 +256,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt Cloud! - + ## Configure dbt Cloud @@ -264,19 +264,19 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Admin** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. - + 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. - + 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt Cloud trial. - + 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt Cloud registration page. Fill out the form. Make sure to save the password somewhere for login in the future. - + 9. Select **Complete Registration**. You should now be redirected to your dbt Cloud account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. @@ -286,43 +286,43 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, select **Profile Settings** from the gear icon in the upper right. - + 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. - + 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name (`hwatson` is used in the lab screenshots). Be sure to click **Save** for your changes! - + 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt Cloud’s Integrated Development Environment (IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. - + 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and push** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. - + - + 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt Cloud allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. - + 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. - + 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. - + 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC_DBT_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. - + ## Create branch and set up project configs @@ -414,15 +414,15 @@ dbt Labs has developed a [project structure guide](/best-practices/how-we-struct 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. - - + + - If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: - + Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. @@ -742,21 +742,21 @@ The next step is to set up the staging models for each of the 8 source tables. G After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: - + 1. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt Cloud will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. - + 1. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! - + Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! - + Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. @@ -1055,7 +1055,7 @@ By now, we are pretty good at creating new files in the correct directories so w 1. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! - + 1. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. @@ -1128,7 +1128,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? - The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. - + The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. @@ -1142,7 +1142,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? ``` and preview the output: - + Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. @@ -1187,7 +1187,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. - + We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. @@ -1314,7 +1314,7 @@ At a high level we’ll be: - The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! - The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: - + ### Covariate encoding @@ -1565,7 +1565,7 @@ If you haven’t seen code like this before or use joblib files to save machine - Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). - Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: - + 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: @@ -1573,10 +1573,10 @@ If you haven’t seen code like this before or use joblib files to save machine list @modelstage ``` - + 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. - + ### Predicting on new data @@ -1731,7 +1731,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Generic tests 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. - + ```yaml version: 2 @@ -1762,7 +1762,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Using macros for testing 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. - + ```sql {% macro test_all_values_gte_zero(table, column) %} @@ -1776,7 +1776,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. - + 5. Copy the following code into the file and save: @@ -1805,7 +1805,7 @@ These tests are defined in `.sql` files, typically in your `tests` directory (as Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. - + 2. Copy the following code and save the file: @@ -1841,11 +1841,11 @@ Let’s add a custom test that asserts that the moving average of the lap time o dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` - + 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. - + ## Document your dbt project @@ -1865,17 +1865,17 @@ To start, let’s look back at our `intermediate.md` file. We can see that we pr ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. - + 2. Go to our project area and view `int_results`. View the description that we created in our doc block. - + 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). - + 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. Starting in dbt v1.3, we can see how our lineage in our docs looks. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot. - + ## Deploy your code @@ -1890,18 +1890,18 @@ Now that we've completed testing and documenting our work, we're ready to deploy 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge to main**. Select **Merge to main** and the merge process will automatically run in the background. - + 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the menu, select **Deploy** **> Environments** - + 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt Cloud is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. - + 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. @@ -1912,11 +1912,11 @@ Now that we've completed testing and documenting our work, we're ready to deploy So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. - - + + 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. - + ### Conclusion diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md index b5b020dd768..378ec857f6a 100644 --- a/website/docs/guides/dremio-lakehouse.md +++ b/website/docs/guides/dremio-lakehouse.md @@ -143,7 +143,7 @@ dremioSamples: Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: - + ## About the schema.yml @@ -156,7 +156,7 @@ The models correspond to both weather and trip data respectively and will be joi The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. - + ## About the models @@ -170,11 +170,11 @@ The sources can be found by navigating to the **Object Storage** section of the When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. - + Open the **Application folder** and you will see the output of the simple transformation we did using dbt. - + ## Query the data @@ -191,6 +191,6 @@ GROUP BY vendor_id ``` - + This completes the integration setup and data is ready for business consumption. diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index 53cf154d09e..fcd1e5e9599 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -67,7 +67,7 @@ $ pwd 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command.
- +
6. dbt provides the following values in the `dbt_project.yml` file: @@ -126,7 +126,7 @@ $ dbt debug ```
- +
### FAQs @@ -150,7 +150,7 @@ dbt run You should have an output that looks like this:
- +
## Commit your changes @@ -197,7 +197,7 @@ $ git checkout -b add-customers-model 4. From the command line, enter `dbt run`.
- +
When you return to the BigQuery console, you can `select` from this model. @@ -463,6 +463,6 @@ We recommend using dbt Cloud as the easiest and most reliable way to [deploy job For more info on how to get started, refer to [create and schedule jobs](/docs/deploy/deploy-jobs#create-and-schedule-jobs). - + For more information about using dbt Core to schedule a job, refer [dbt airflow](/blog/dbt-airflow-spiritual-alignment) blog post. diff --git a/website/docs/guides/microsoft-fabric-qs.md b/website/docs/guides/microsoft-fabric-qs.md index 2d2dd738c42..1d1e016a6f1 100644 --- a/website/docs/guides/microsoft-fabric-qs.md +++ b/website/docs/guides/microsoft-fabric-qs.md @@ -41,7 +41,7 @@ A public preview of Microsoft Fabric in dbt Cloud is now available! 1. Log in to your [Microsoft Fabric](http://app.fabric.microsoft.com) account. 2. On the home page, select the **Synapse Data Warehouse** tile. - + 3. From **Workspaces** on the left sidebar, navigate to your organization’s workspace. Or, you can create a new workspace; refer to [Create a workspace](https://learn.microsoft.com/en-us/fabric/get-started/create-workspaces) in the Microsoft docs for more details. 4. Choose your warehouse from the table. Or, you can create a new warehouse; refer to [Create a warehouse](https://learn.microsoft.com/en-us/fabric/data-warehouse/tutorial-create-warehouse) in the Microsoft docs for more details. @@ -100,7 +100,7 @@ A public preview of Microsoft Fabric in dbt Cloud is now available! ); ``` - + ## Connect dbt Cloud to Microsoft Fabric diff --git a/website/docs/guides/productionize-your-dbt-databricks-project.md b/website/docs/guides/productionize-your-dbt-databricks-project.md index 456c69dcb87..3584cffba77 100644 --- a/website/docs/guides/productionize-your-dbt-databricks-project.md +++ b/website/docs/guides/productionize-your-dbt-databricks-project.md @@ -105,7 +105,7 @@ The [run history](/docs/deploy/run-visibility#run-history) dashboard in dbt Clou The deployment monitor in dbt Cloud offers a higher-level view of your run history, enabling you to gauge the health of your data pipeline over an extended period of time. This feature includes information on run durations and success rates, allowing you to identify trends in job performance, such as increasing run times or more frequent failures. The deployment monitor also highlights jobs in progress, queued, and recent failures. To access the deployment monitor click on the dbt logo in the top left corner of the dbt Cloud UI. - + By adding [status tiles](/docs/deploy/dashboard-status-tiles) to your BI dashboards, you can give stakeholders visibility into the health of your data pipeline without leaving their preferred interface. Status tiles instill confidence in your data and help prevent unnecessary inquiries or context switching. To implement dashboard status tiles, you'll need to have dbt docs with [exposures](/docs/build/exposures) defined. diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 5f3395acb82..c81a4d247a5 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -43,17 +43,17 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE_IN_PROGRESS status. - + 4. When the stack status changes to CREATE_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. - + 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. - + 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. @@ -63,9 +63,9 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **User name** — `dbtadmin` - **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. - + - + 9. Click **Create connection**. @@ -80,15 +80,15 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. - + 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. - + 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. @@ -171,7 +171,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Port** — `5439` - **Database** — `dbtworkshop`.
- +
5. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: @@ -179,7 +179,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Password** — This is the autogenerated password that you used earlier in the guide - **Schema** — dbt Cloud automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
- +
6. Click **Test Connection**. This verifies that dbt Cloud can access your Redshift cluster. diff --git a/website/docs/guides/refactoring-legacy-sql.md b/website/docs/guides/refactoring-legacy-sql.md index b12baac95cd..a339e523020 100644 --- a/website/docs/guides/refactoring-legacy-sql.md +++ b/website/docs/guides/refactoring-legacy-sql.md @@ -44,7 +44,7 @@ While refactoring you'll be **moving around** a lot of logic, but ideally you wo To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. - + Once you've copied it over, you'll want to `dbt run` to execute the query and populate the in your warehouse. @@ -76,7 +76,7 @@ If you're migrating multiple stored procedures into dbt, with sources you can se This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. - + #### Build the habit of analytics-as-code Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index aa4811d9339..89d7c5a14fa 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -22,7 +22,7 @@ After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: - + ### Prerequisites @@ -309,7 +309,7 @@ The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing t In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: - + ### Advanced prerequisites @@ -323,7 +323,7 @@ As noted above, this branch will outlive any individual feature, and will be the See [Custom branch behavior](/docs/dbt-cloud-environments#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. - + ### 3. Create a new QA environment diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 492609c9bcf..0401c37871f 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -143,35 +143,35 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 1. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Admin**. Then, select **Partner Connect**. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt. - + If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. - + 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. - + - + 3. Click **Activate** when a popup appears: - + - + 4. After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. - + 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt Cloud automatically. 6. From your **Account Settings** in dbt Cloud (using the gear menu in the upper right corner), choose the "Partner Connect Trial" project and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. - + - +
@@ -181,7 +181,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 2. Enter a project name and click **Continue**. 3. For the warehouse, click **Snowflake** then **Next** to set up your connection. - + 4. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs. @@ -192,7 +192,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. - + 5. Enter your **Development Credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. @@ -201,7 +201,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently. - + 6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account. 7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials. diff --git a/website/docs/guides/starburst-galaxy-qs.md b/website/docs/guides/starburst-galaxy-qs.md index 9a6c44574cd..1822c83fa90 100644 --- a/website/docs/guides/starburst-galaxy-qs.md +++ b/website/docs/guides/starburst-galaxy-qs.md @@ -92,11 +92,11 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: - + 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. - + 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. @@ -113,7 +113,7 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To When done, click **Add privileges**. - + ## Create tables with Starburst Galaxy To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). @@ -121,7 +121,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: - + 3. Copy and paste these queries into the query editor. Then **Run** each query individually. @@ -181,7 +181,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: - + 5. Verify that the tables were created successfully. In the query editor, run the following queries: diff --git a/website/docs/reference/commands/clone.md b/website/docs/reference/commands/clone.md index 651d0c36908..6bdc2c02e07 100644 --- a/website/docs/reference/commands/clone.md +++ b/website/docs/reference/commands/clone.md @@ -49,7 +49,7 @@ You can clone nodes between states in dbt Cloud using the `dbt clone` command. T - Set up your **Production environment** and have a successful job run. - Enable **Defer to production** by toggling the switch in the lower-right corner of the command bar. - + - Run the `dbt clone` command from the command bar. diff --git a/website/docs/reference/node-selection/graph-operators.md b/website/docs/reference/node-selection/graph-operators.md index 88d99d7b92a..8cba43e1b52 100644 --- a/website/docs/reference/node-selection/graph-operators.md +++ b/website/docs/reference/node-selection/graph-operators.md @@ -29,7 +29,7 @@ dbt run --select "3+my_model+4" # select my_model, its parents up to the ### The "at" operator The `@` operator is similar to `+`, but will also include _the parents of the children of the selected model_. This is useful in continuous integration environments where you want to build a model and all of its children, but the _parents_ of those children might not exist in the database yet. The selector `@snowplow_web_page_context` will build all three models shown in the diagram below. - + ```bash dbt run --models @my_model # select my_model, its children, and the parents of its children diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index a5198fd3487..8f323bc4236 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -379,7 +379,7 @@ models: - + ### Specifying tags BigQuery table and view *tags* can be created by supplying an empty string for the label value. diff --git a/website/docs/reference/resource-configs/persist_docs.md b/website/docs/reference/resource-configs/persist_docs.md index 481f25d4e95..15b1e0bdb40 100644 --- a/website/docs/reference/resource-configs/persist_docs.md +++ b/website/docs/reference/resource-configs/persist_docs.md @@ -186,8 +186,8 @@ models: Run dbt and observe that the created relation and columns are annotated with your descriptions: - - diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index 5c32fa5fc83..ce3b317f0f1 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -104,7 +104,7 @@ If no `partition_by` is specified, then the `insert_overwrite` strategy will ato - This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). - If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). - + + If mixing images and text together, also consider using a docs block. diff --git a/website/docs/terms/dag.md b/website/docs/terms/dag.md index b108c68806a..c6b91300bfc 100644 --- a/website/docs/terms/dag.md +++ b/website/docs/terms/dag.md @@ -32,7 +32,7 @@ One of the great things about DAGs is that they are *visual*. You can clearly id Take this mini-DAG for an example: - + What can you learn from this DAG? Immediately, you may notice a handful of things: @@ -57,7 +57,7 @@ You can additionally use your DAG to help identify bottlenecks, long-running dat ...to name just a few. Understanding the factors impacting model performance can help you decide on [refactoring approaches](https://courses.getdbt.com/courses/refactoring-sql-for-modularity), [changing model materialization](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model#attempt-2-moving-to-an-incremental-model)s, replacing multiple joins with surrogate keys, or other methods. - + ### Modular data modeling best practices @@ -83,7 +83,7 @@ The marketing team at dbt Labs would be upset with us if we told you we think db Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project. - + The DAG is also [available in the dbt Cloud IDE](https://www.getdbt.com/blog/on-dags-hierarchies-and-ides/), so you and your team can refer to your lineage while you build your models. diff --git a/website/docs/terms/data-lineage.md b/website/docs/terms/data-lineage.md index 1dbda6e6b67..d0162c35616 100644 --- a/website/docs/terms/data-lineage.md +++ b/website/docs/terms/data-lineage.md @@ -63,13 +63,13 @@ In the greater data world, you may often hear of data lineage systems based on t If you use a transformation tool such as dbt that automatically infers relationships between data sources and models, a DAG automatically populates to show you the lineage that exists for your [data transformations](https://www.getdbt.com/analytics-engineering/transformation/). - + Your is used to visually show upstream dependencies, the nodes that must come before a current model, and downstream relationships, the work that is impacted by the current model. DAGs are also directional—they show a defined flow of movement and form non-cyclical loops. Ultimately, DAGs are an effective way to see relationships between data sources, models, and dashboards. DAGs are also a great way to see visual bottlenecks, or inefficiencies in your data work (see image below for a DAG with...many bottlenecks). Data teams can additionally add [meta fields](https://docs.getdbt.com/reference/resource-configs/meta) and documentation to nodes in the DAG to add an additional layer of governance to their dbt project. - + :::tip Automatic > Manual diff --git a/website/snippets/_cloud-environments-info.md b/website/snippets/_cloud-environments-info.md index f1b9924b83c..56030e5fce7 100644 --- a/website/snippets/_cloud-environments-info.md +++ b/website/snippets/_cloud-environments-info.md @@ -55,7 +55,7 @@ Extended Attributes is a text box extension at the environment level that overri Something to note, Extended Attributes doesn't mask secret values. We recommend avoiding setting secret values to prevent visibility in the text box and logs. -
+
If you're developing in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud), [dbt Cloud CLI](/docs/cloud/cloud-cli-installation), or [orchestrating job runs](/docs/deploy/deployments), Extended Attributes parses through the provided YAML and extracts the `profiles.yml` attributes. For each individual attribute: @@ -90,7 +90,7 @@ dbt Cloud will use the cached copy of your project's Git repo under these circum To enable Git repository caching, select **Account settings** from the gear menu and enable the **Repository caching** option. - + :::note @@ -108,5 +108,4 @@ Partial parsing in dbt Cloud requires dbt version 1.4 or newer. The feature does To enable, select **Account settings** from the gear menu and enable the **Partial parsing** option. - - + \ No newline at end of file diff --git a/website/snippets/_new-sl-setup.md b/website/snippets/_new-sl-setup.md index 5c048824ac3..a02481db33d 100644 --- a/website/snippets/_new-sl-setup.md +++ b/website/snippets/_new-sl-setup.md @@ -18,17 +18,17 @@ If you've configured the legacy Semantic Layer, it has been deprecated, and dbt 3. In the **Project Details** page, navigate to the **Semantic Layer** section, and select **Configure Semantic Layer**. - + 4. In the **Set Up Semantic Layer Configuration** page, enter the credentials you want the Semantic Layer to use specific to your data platform. We recommend credentials have the least privileges required because your Semantic Layer users will be querying it in downstream applications. At a minimum, the Semantic Layer needs to have read access to the schema(s) that contains the dbt models that you used to build your semantic models. - + 5. Select the deployment environment you want for the Semantic Layer and click **Save**. 6. After saving it, you'll be provided with the connection information that allows you to connect to downstream tools. If your tool supports JDBC, save the JDBC URL or individual components (like environment id and host). If it uses the GraphQL API, save the GraphQL API host information instead. - + 7. Save and copy your environment ID, service token, and host, which you'll need to use downstream tools. For more info on how to integrate with partner integrations, refer to [Available integrations](/docs/use-dbt-semantic-layer/avail-sl-integrations). diff --git a/website/snippets/_sl-run-prod-job.md b/website/snippets/_sl-run-prod-job.md index b666cfa8e61..a637b0b431e 100644 --- a/website/snippets/_sl-run-prod-job.md +++ b/website/snippets/_sl-run-prod-job.md @@ -4,4 +4,4 @@ Once you’ve defined metrics in your dbt project, you can perform a job run in 2. Select **Jobs** to rerun the job with the most recent code in the deployment environment. 3. Your metric should appear as a red node in the dbt Cloud IDE and dbt directed acyclic graphs (DAG). - + diff --git a/website/snippets/quickstarts/intro-build-models-atop-other-models.md b/website/snippets/quickstarts/intro-build-models-atop-other-models.md index eeedec34892..1104461079b 100644 --- a/website/snippets/quickstarts/intro-build-models-atop-other-models.md +++ b/website/snippets/quickstarts/intro-build-models-atop-other-models.md @@ -2,4 +2,4 @@ As a best practice in SQL, you should separate logic that cleans up your data fr Now you can experiment by separating the logic out into separate models and using the [ref](/reference/dbt-jinja-functions/ref) function to build models on top of other models: - + diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index a3d211ea237..1c748bbb04f 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -1,56 +1,34 @@ -import React, { useState, useEffect } from 'react'; +import React from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; function Lightbox({ - src, + src, collapsed = false, - alignment = "center", - alt = undefined, - title = undefined, + alignment = "center", + alt = undefined, + title = undefined, width = undefined, }) { - const [isHovered, setIsHovered] = useState(false); - const [expandImage, setExpandImage] = useState(false); - useEffect(() => { - let timeoutId; - - if (isHovered) { - // Delay the expansion by 5 milliseconds - timeoutId = setTimeout(() => { - setExpandImage(true); - }, 5); - } - - return () => { - clearTimeout(timeoutId); - }; - }, [isHovered]); - - const handleMouseEnter = () => { - setIsHovered(true); - }; - - const handleMouseLeave = () => { - setIsHovered(false); - setExpandImage(false); - }; + // Set alignment class if alignment prop used + let imageAlignment = '' + if(alignment === "left") { + imageAlignment = styles.leftAlignLightbox + } else if(alignment === "right") { + imageAlignment = styles.rightAlignLightbox + } return ( <> - + ); } diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index eb280b2feb7..af0bb086cf5 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -1,4 +1,3 @@ - :local(.title) { text-align: center; font-size: small; @@ -25,9 +24,3 @@ .rightAlignLightbox { margin: 10px 0 10px auto; } - -:local(.hovered) { /* Add the . before the class name */ - filter: drop-shadow(4px 4px 6px #aaaaaae1); - transition: transform 0.3s ease; - z-index: 9999; -} From a6266ebff2e61397316e70ae0e2d39f16b1a7e93 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 11 Jan 2024 17:39:30 +0000 Subject: [PATCH 30/56] add --- website/src/components/lightbox/index.js | 57 +++++++++++++------ .../src/components/lightbox/styles.module.css | 6 ++ 2 files changed, 46 insertions(+), 17 deletions(-) diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index 1c748bbb04f..a3d211ea237 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -1,34 +1,56 @@ -import React from 'react'; +import React, { useState, useEffect } from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; function Lightbox({ - src, + src, collapsed = false, - alignment = "center", - alt = undefined, - title = undefined, + alignment = "center", + alt = undefined, + title = undefined, width = undefined, }) { + const [isHovered, setIsHovered] = useState(false); + const [expandImage, setExpandImage] = useState(false); - // Set alignment class if alignment prop used - let imageAlignment = '' - if(alignment === "left") { - imageAlignment = styles.leftAlignLightbox - } else if(alignment === "right") { - imageAlignment = styles.rightAlignLightbox - } + useEffect(() => { + let timeoutId; + + if (isHovered) { + // Delay the expansion by 5 milliseconds + timeoutId = setTimeout(() => { + setExpandImage(true); + }, 5); + } + + return () => { + clearTimeout(timeoutId); + }; + }, [isHovered]); + + const handleMouseEnter = () => { + setIsHovered(true); + }; + + const handleMouseLeave = () => { + setIsHovered(false); + setExpandImage(false); + }; return ( <> - @@ -37,13 +59,14 @@ function Lightbox({ alt={alt ? alt : title ? title : ''} title={title ? title : ''} src={imageCacheWrapper(src)} + style={expandImage ? { transform: 'scale(1.3)', transition: 'transform 0.3s ease', zIndex: '9999' } : {}} /> {title && ( { title } )} - +
); } diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index af0bb086cf5..3027a88f45a 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -24,3 +24,9 @@ .rightAlignLightbox { margin: 10px 0 10px auto; } + +:local(.hovered) { /* Add the . before the class name */ + filter: drop-shadow(4px 4px 6px #aaaaaae1); + transition: transform 0.3s ease; + z-index: 9999; +} From 1fc53fd6f8685128d0e810ed445ff5e727c815b0 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 11 Jan 2024 17:46:33 +0000 Subject: [PATCH 31/56] add width script --- ...022-11-22-move-spreadsheets-to-your-dwh.md | 10 +- .../blog/2022-11-30-dbt-project-evaluator.md | 4 +- .../blog/2023-01-17-grouping-data-tests.md | 4 +- ...01-ingestion-time-partitioning-bigquery.md | 2 +- website/blog/2023-03-23-audit-helper.md | 16 +-- ...ng-a-kimball-dimensional-model-with-dbt.md | 2 +- ...23-04-24-framework-refactor-alteryx-dbt.md | 10 +- ...odeling-ragged-time-varying-hierarchies.md | 2 +- .../2023-05-04-generating-dynamic-docs.md | 10 +- website/blog/2023-07-17-GPT-and-dbt-test.md | 6 +- ...023-08-01-announcing-materialized-views.md | 2 +- .../blog/2024-01-09-defer-in-development.md | 4 +- .../dbt-unity-catalog-best-practices.md | 4 +- .../docs/docs/build/custom-target-names.md | 4 +- website/docs/docs/build/data-tests.md | 2 +- .../docs/docs/build/environment-variables.md | 20 ++-- website/docs/docs/build/exposures.md | 4 +- website/docs/docs/build/python-models.md | 4 +- website/docs/docs/build/sources.md | 4 +- website/docs/docs/build/sql-models.md | 2 +- .../about-connections.md | 2 +- .../connect-apache-spark.md | 2 +- .../connect-databricks.md | 2 +- .../connect-snowflake.md | 4 +- .../connnect-bigquery.md | 4 +- .../docs/docs/cloud/git/authenticate-azure.md | 4 +- website/docs/docs/cloud/git/connect-github.md | 8 +- website/docs/docs/cloud/git/connect-gitlab.md | 14 +-- .../cloud/git/import-a-project-by-git-url.md | 12 +- website/docs/docs/cloud/git/setup-azure.md | 14 +-- .../cloud/manage-access/auth0-migration.md | 26 ++--- .../manage-access/cloud-seats-and-users.md | 5 +- .../manage-access/enterprise-permissions.md | 4 +- .../docs/cloud/manage-access/invite-users.md | 12 +- .../manage-access/set-up-bigquery-oauth.md | 10 +- .../manage-access/set-up-databricks-oauth.md | 4 +- .../manage-access/set-up-snowflake-oauth.md | 4 +- .../set-up-sso-google-workspace.md | 10 +- .../manage-access/set-up-sso-saml-2.0.md | 2 +- .../docs/cloud/manage-access/sso-overview.md | 2 +- .../docs/docs/cloud/secure/ip-restrictions.md | 4 +- .../docs/cloud/secure/redshift-privatelink.md | 14 +-- .../cloud/secure/snowflake-privatelink.md | 2 +- .../cloud-build-and-view-your-docs.md | 6 +- .../docs/docs/collaborate/documentation.md | 8 +- .../collaborate/git/managed-repository.md | 2 +- .../docs/collaborate/git/merge-conflicts.md | 10 +- .../docs/docs/collaborate/git/pr-template.md | 2 +- .../docs/collaborate/model-performance.md | 4 +- .../docs/dbt-cloud-apis/service-tokens.md | 2 +- .../docs/docs/dbt-cloud-apis/user-tokens.md | 2 +- .../removing-prerelease-versions.md | 2 +- .../run-details-and-logs-improvements.md | 2 +- .../81-May-2023/run-history-improvements.md | 2 +- .../86-Dec-2022/new-jobs-default-as-off.md | 2 +- .../92-July-2022/render-lineage-feature.md | 2 +- .../95-March-2022/ide-timeout-message.md | 2 +- .../95-March-2022/prep-and-waiting-time.md | 2 +- .../dbt-versions/upgrade-core-in-cloud.md | 6 +- website/docs/docs/deploy/artifacts.md | 8 +- website/docs/docs/deploy/ci-jobs.md | 4 +- .../docs/deploy/dashboard-status-tiles.md | 6 +- website/docs/docs/deploy/deploy-jobs.md | 2 +- website/docs/docs/deploy/deployment-tools.md | 6 +- website/docs/docs/deploy/source-freshness.md | 6 +- .../using-the-dbt-ide.md | 8 +- website/docs/faqs/API/rotate-token.md | 2 +- .../faqs/Accounts/change-users-license.md | 4 +- .../Accounts/cloud-upgrade-instructions.md | 6 +- website/docs/faqs/Project/delete-a-project.md | 4 +- website/docs/guides/adapter-creation.md | 14 +-- website/docs/guides/bigquery-qs.md | 4 +- website/docs/guides/codespace-qs.md | 2 +- website/docs/guides/custom-cicd-pipelines.md | 2 +- website/docs/guides/databricks-qs.md | 32 +++--- website/docs/guides/dbt-python-snowpark.md | 106 +++++++++--------- website/docs/guides/dremio-lakehouse.md | 8 +- website/docs/guides/manual-install-qs.md | 8 +- website/docs/guides/redshift-qs.md | 20 ++-- website/docs/guides/refactoring-legacy-sql.md | 4 +- website/docs/guides/set-up-ci.md | 6 +- website/docs/guides/snowflake-qs.md | 24 ++-- website/docs/guides/starburst-galaxy-qs.md | 10 +- .../node-selection/graph-operators.md | 2 +- .../resource-configs/bigquery-configs.md | 2 +- .../resource-configs/persist_docs.md | 4 +- .../resource-configs/spark-configs.md | 2 +- .../resource-properties/description.md | 2 +- website/docs/terms/dag.md | 6 +- website/docs/terms/data-lineage.md | 2 +- .../intro-build-models-atop-other-models.md | 2 +- 91 files changed, 327 insertions(+), 328 deletions(-) diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md index 93cf91efeed..09274b41a9b 100644 --- a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md +++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md @@ -70,9 +70,9 @@ An obvious choice if you have data to load into your warehouse would be your exi [Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either. - + - + A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated. @@ -100,7 +100,7 @@ The main benefit of connecting to Google Sheets instead of a static spreadsheet Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching. - + Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null. @@ -119,7 +119,7 @@ Beware of inconsistent data types though—if someone types text into a column t I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse. - + Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side. @@ -174,7 +174,7 @@ Each of the major data warehouses also has native integrations to import spreads Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system. - + ### BigQuery diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md index 3ea7a459c35..b936d4786cd 100644 --- a/website/blog/2022-11-30-dbt-project-evaluator.md +++ b/website/blog/2022-11-30-dbt-project-evaluator.md @@ -20,7 +20,7 @@ If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), yo Don’t believe me??? Here’s photographic proof. - + Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse. @@ -120,4 +120,4 @@ If something isn’t working quite right or you have ideas for future functional Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond. - + diff --git a/website/blog/2023-01-17-grouping-data-tests.md b/website/blog/2023-01-17-grouping-data-tests.md index 23fcce6d27e..3648837302b 100644 --- a/website/blog/2023-01-17-grouping-data-tests.md +++ b/website/blog/2023-01-17-grouping-data-tests.md @@ -43,11 +43,11 @@ So what do we discover when we validate our data by group? Testing for monotonicity, we find many poorly behaved turnstiles. Unlike the well-behaved dark blue line, other turnstiles seem to _decrement_ versus _increment_ with each rotation while still others cyclically increase and plummet to zero – perhaps due to maintenance events, replacements, or glitches in communication with the central server. - + Similarly, while no expected timestamp is missing from the data altogether, a more rigorous test of timestamps _by turnstile_ reveals between roughly 50-100 missing observations for any given period. - + _Check out this [GitHub gist](https://gist.github.com/emilyriederer/4dcc6a05ea53c82db175e15f698a1fb6) to replicate these views locally._ diff --git a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md index 99ce142d5ed..51a62006ee8 100644 --- a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md +++ b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md @@ -125,7 +125,7 @@ In both cases, the operation can be done on a single partition at a time so it r On a 192 GB partition here is how the different methods compare: - + Also, the `SELECT` statement consumed more than 10 hours of slot time while `MERGE` statement took days of slot time. diff --git a/website/blog/2023-03-23-audit-helper.md b/website/blog/2023-03-23-audit-helper.md index 8599ad5eb5d..106715c5e4f 100644 --- a/website/blog/2023-03-23-audit-helper.md +++ b/website/blog/2023-03-23-audit-helper.md @@ -19,7 +19,7 @@ It is common for analytics engineers (AE) and data analysts to have to refactor Not only is that approach time-consuming, but it is also prone to naive assumptions that values match based on aggregate measures (such as counts or sums). To provide a better, more accurate approach to auditing, dbt Labs has created the `audit_helper` package. `audit_helper` is a package for dbt whose main purpose is to audit data by comparing two tables (the original one versus a refactored model). It uses a simple and intuitive query structure that enables quickly comparing tables based on the column values, row amount, and even column types (for example, to make sure that a given column is numeric in both your table and the original one). Figure 1 graphically displays the workflow and where `audit_helper` is positioned in the refactoring process. - + Now that it is clear where the `audit_helper` package is positioned in the refactoring process, it is important to highlight the benefits of using audit_helper (and ultimately, of auditing refactored models). Among the benefits, we can mention: - **Quality assurance**: Assert that a refactored model is reaching the same output as the original model that is being refactored. @@ -57,12 +57,12 @@ According to the `audit_helper` package documentation, this macro comes in handy ### How it works When you run the dbt audit model, it will compare all columns, row by row. To count for the match, every column in a row from one source must exactly match a row from another source, as illustrated in the example in Figure 2 below: - + As shown in the example, the model is compared line by line, and in this case, all lines in both models are equivalent and the result should be 100%. Figure 3 below depicts a row in which two of the three columns are equal and only the last column of row 1 has divergent values. In this case, despite the fact that most of row 1 is identical, that row will not be counted towards the final result. In this example, only row 2 and row 3 are valid, yielding a 66.6% match in the total of analyzed rows. - + As previously stated, for the match to be valid, all column values of a model’s row must be equal to the other model. This is why we sometimes need to exclude columns from the comparison (such as date columns, which can have a time zone difference from the original model to the refactored — we will discuss tips like these below). @@ -103,12 +103,12 @@ Let’s understand the arguments used in the `compare_queries` macro: - `summarize` (optional): This argument allows you to switch between a summary or detailed (verbose) view of the compared data. This argument accepts true or false values (its default is set to be true). 3. Replace the sources from the example with your own - + As illustrated in Figure 4, using the `ref` statements allows you to easily refer to your development model, and using the full path makes it easy to refer to the original table (which will be useful when you are refactoring a SQL Server Stored Procedure or Alteryx Workflow that is already being materialized in the data warehouse). 4. Specify your comparison columns - + Delete the example columns and replace them with the columns of your models, exactly as they are written in each model. You should rename/alias the columns to match, as well as ensuring they are in the same order within the `select` clauses. @@ -129,7 +129,7 @@ Let’s understand the arguments used in the `compare_queries` macro: ``` The output will be the similar to the one shown in Figure 6 below: - +
The output is presented in table format, with each column explained below:
@@ -155,7 +155,7 @@ While we can surely rely on that overview to validate the final refactored model A really useful way to check out which specific columns are driving down the match percentage between tables is the `compare_column_values` macro that allows us to audit column values. This macro requires a column to be set, so it can be used as an anchor to compare entries between the refactored dbt model column and the legacy table column. Figure 7 illustrates how the `compare_column_value`s macro works. - + The macro’s output summarizes the status of column compatibility, breaking it down into different categories: perfect match, both are null, values do not match, value is null in A only, value is null in B only, missing from A and missing from B. This level of detailing makes it simpler for the AE or data analyst to figure out what can be causing incompatibility issues between the models. While refactoring a model, it is common that some keys used to join models are inconsistent, bringing up unwanted null values on the final model as a result, and that would cause the audit row query to fail, without giving much more detail. @@ -224,7 +224,7 @@ Also, we can see that the example code includes a table printing option enabled But unlike from the `compare_queries` macro, if you have kept the printing function enabled, you should expect a table to be printed in the command line when you run the model, as shown in Figure 8. Otherwise, it will be materialized on your data warehouse like this: - + The `compare_column_values` macro separates column auditing results in seven different labels: - **Perfect match**: count of rows (and relative percentage) where the column values compared between both tables are equal and not null; diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md index ab364749eff..691a7f77571 100644 --- a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md +++ b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md @@ -39,7 +39,7 @@ Dimensional modeling is a technique introduced by Ralph Kimball in 1996 with his The goal of dimensional modeling is to take raw data and transform it into Fact and Dimension tables that represent the business. - + The benefits of dimensional modeling are: diff --git a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md index 46cfcb58cdd..2c6a9d87591 100644 --- a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md +++ b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md @@ -17,7 +17,7 @@ Alteryx is a visual data transformation platform with a user-friendly interface Transforming data to follow business rules can be a complex task, especially with the increasing amount of data collected by companies. To reduce such complexity, data transformation solutions designed as drag-and-drop tools can be seen as more intuitive, since analysts can visualize the steps taken to transform data. One example of a popular drag-and-drop transformation tool is Alteryx which allows business analysts to transform data by dragging and dropping operators in a canvas. The graphic interface of Alteryx Designer is presented in **Figure 1**. - + Nonetheless, as data workflows become more complex, Alteryx lacks the modularity, documentation, and version control capabilities that these flows require. In this sense, dbt may be a more suitable solution to building resilient and modular data pipelines due to its focus on data modeling. @@ -62,7 +62,7 @@ This blog post reports a consulting project for a major client at Indicium Tech When the client hired Indicium, they had dozens of Alteryx workflows built and running daily solely for the marketing team, which was the focus of the project. For the marketing team, the Alteryx workflows had to be executed in the correct order since they were interdependent, which means one Alteryx workflow used the outcome of the previous one, and so on. The main Alteryx workflows run daily by the marketing team took about 6 hours to run. Another important aspect to consider was that if a model had not finished running when the next one downstream began to run, the data would be incomplete, requiring the workflow to be run again. The execution of all models was usually scheduled to run overnight and by early morning, so the data would be up to date the next day. But if there was an error the night before, the data would be incorrect or out of date. **Figure 3** exemplifies the scheduler. - + Data lineage was a point that added a lot of extra labor because it was difficult to identify which models were dependent on others with so many Alteryx workflows built. When the number of workflows increased, it required a long time to create a view of that lineage in another software. So, if a column's name changed in a model due to a change in the model's source, the marketing analysts would have to map which downstream models were impacted by such change to make the necessary adjustments. Because model lineage was mapped manually, it was a challenge to keep it up to date. @@ -89,7 +89,7 @@ The first step is to validate all data sources and create one com It is essential to click on each data source (the green book icons on the leftmost side of **Figure 5**) and examine whether any transformations have been done inside that data source query. It is very common for a source icon to contain more than one data source or filter, which is why this step is important. The next step is to follow the workflow and transcribe the transformations into SQL queries in the dbt models to replicate the same data transformations as in the Alteryx workflow. - + For this step, we identified which operators were used in the data source (for example, joining data, order columns, group by, etc). Usually the Alteryx operators are pretty self-explanatory and all the information needed for understanding appears on the left side of the menu. We also checked the documentation to understand how each Alteryx operator works behind the scenes. @@ -102,7 +102,7 @@ Auditing large models, with sometimes dozens of columns and millions of rows, ca In this project, we used [the `audit_helper` package](https://github.com/dbt-labs/dbt-audit-helper), because it provides more robust auditing macros and offers more automation possibilities for our use case. To that end, we needed to have both the legacy Alteryx workflow output table and the refactored dbt model materialized in the project’s data warehouse. Then we used the macros available in `audit_helper` to compare query results, data types, column values, row numbers and many more things that are available within the package. For an in-depth explanation and tutorial on how to use the `audit_helper` package, check out [this blog post](https://docs.getdbt.com/blog/audit-helper-for-migration). **Figure 6** graphically illustrates the validation logic behind audit_helper. - + #### Step 4: Duplicate reports and connect them to the dbt refactored models @@ -120,7 +120,7 @@ The conversion proved to be of great value to the client due to three main aspec - Improved workflow visibility: dbt’s support for documentation and testing, associated with dbt Cloud, allows for great visibility of the workflow’s lineage execution, accelerating errors and data inconsistencies identification and troubleshooting. More than once, our team was able to identify the impact of one column’s logic alteration in downstream models much earlier than these Alteryx models. - Workflow simplification: dbt’s modularized approach of data modeling, aside from accelerating total run time of the data workflow, simplified the construction of new tables, based on the already existing modules, and improved code readability. - + As we can see, refactoring Alteryx to dbt was an important step in the direction of data availability, and allowed for much more agile processes for the client’s data team. With less time dedicated to manually executing sequential Alteryx workflows that took hours to complete, and searching for errors in each individual file, the analysts could focus on what they do best: **getting insights from the data and generating value from them**. diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md index f719bdb40cb..2b00787cc07 100644 --- a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md +++ b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md @@ -22,7 +22,7 @@ To help visualize this data, we're going to pretend we are a company that manufa Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like: - + This hierarchy is *ragged* because different paths through the hierarchy terminate at different depths. It is *time-varying* because specific components can be added and removed. diff --git a/website/blog/2023-05-04-generating-dynamic-docs.md b/website/blog/2023-05-04-generating-dynamic-docs.md index 1e704178b0a..b6e8d929e72 100644 --- a/website/blog/2023-05-04-generating-dynamic-docs.md +++ b/website/blog/2023-05-04-generating-dynamic-docs.md @@ -35,7 +35,7 @@ This results in a lot of the same columns (e.g. `account_id`) existing in differ In fact, I found a better way using some CLI commands, the dbt Codegen package and docs blocks. I also made the following meme in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) channel #memes-and-off-topic-chatter to encapsulate this method: - + ## What pain is being solved? @@ -279,7 +279,7 @@ To confirm the formatting works, run the following command to get dbt Docs up an ``` $ dbt docs && dbt docs serve ``` - + Here, you can confirm that the column descriptions using the doc blocks are working as intended. @@ -326,7 +326,7 @@ user_id ``` Now, open your code editor, and replace `(.*)` with `{% docs column__activity_based_interest__$1 %}\n\n{% enddocs %}\n`, which will result in the following in your markdown file: - + Now you can add documentation to each of your columns. @@ -334,7 +334,7 @@ Now you can add documentation to each of your columns. You can programmatically identify all columns, and have them point towards the newly-created documentation. In your code editor, replace `\s{6}- name: (.*)\n description: ""` with ` - name: $1\n description: "{{ doc('column__activity_based_interest__$1') }}`: - + ⚠️ Some of your columns may already be available in existing docs blocks. In this example, the following replacements are done: - `{{ doc('column__activity_based_interest__user_id') }}` → `{{ doc("column_user_id") }}` @@ -343,7 +343,7 @@ You can programmatically identify all columns, and have them point towards the n ## Check that everything works Run `dbt docs generate`. If there are syntax errors, this will be found out at this stage. If successful, we can run `dbt docs serve` to perform a smoke test and ensure everything looks right: - + ## Additional considerations diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md index 84f756919a5..12e380eb220 100644 --- a/website/blog/2023-07-17-GPT-and-dbt-test.md +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -55,7 +55,7 @@ We all know how ChatGPT can digest very complex prompts, but as this is a tool f Opening ChatGPT with GPT4, my first prompt is usually along these lines: - + And the output of this simple prompt is nothing short of amazing: @@ -118,7 +118,7 @@ Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context o A prompt for it would look something like: - + ## Specify details on generic tests in your prompts @@ -133,7 +133,7 @@ Accepted_values and relationships are slightly trickier but the model can be adj One way of doing this is with a prompt like this: - + Which results in the following output: diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index eb9716e73a5..6534e1d0b56 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -103,7 +103,7 @@ When we talk about using materialized views in development, the question to thin Outside of the scheduling part, development will be pretty standard. Your pipeline is likely going to look something like this: - + This is assuming you have a near real time pipeline where you are pulling from a streaming data source like a Kafka Topic via an ingestion tool of your choice like Snowpipe for Streaming into your data platform. After your data is in the data platform, you will: diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md index 96e2ed53f85..406b036cab4 100644 --- a/website/blog/2024-01-09-defer-in-development.md +++ b/website/blog/2024-01-09-defer-in-development.md @@ -80,7 +80,7 @@ dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and th In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! - + The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : @@ -155,6 +155,6 @@ While defer is a faster and cheaper option for most folks in most situations, de ### Call me Willem Defer - + Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index a55e1d121af..5f230263cf8 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -21,11 +21,11 @@ If you use multiple Databricks workspaces to isolate development from production To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. - + When you create environments in dbt Cloud, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. - + ## Access Control diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index ac7036de572..4786641678d 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -21,9 +21,9 @@ where created_at > date_trunc('month', current_date) To set a custom target name for a job in dbt Cloud, configure the **Target Name** field for your job in the Job Settings page. - + ## dbt Cloud IDE When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. - + diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index d981d7e272d..7c12e5d7059 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -245,7 +245,7 @@ Normally, a data test query will calculate failures as part of its execution. If This workflow allows you to query and examine failing records much more quickly in development: - + Note that, if you elect to store test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).) diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 3f2aebd0036..14076352ac1 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -17,7 +17,7 @@ Environment variables in dbt Cloud must be prefixed with either `DBT_` or `DBT_E Environment variable values can be set in multiple places within dbt Cloud. As a result, dbt Cloud will interpret environment variables according to the following order of precedence (lowest to highest): - + There are four levels of environment variables: 1. the optional default argument supplied to the `env_var` Jinja function in code @@ -30,7 +30,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -38,7 +38,7 @@ You'll notice there is a `Project Default` column. This is a great place to set To the right of the `Project Default` column are all your environments. Values set at the environment level take priority over the project level default value. This is where you can tell dbt Cloud to interpret an environment value differently in your Staging vs. Production environment, as example. - + @@ -48,12 +48,12 @@ You may have multiple jobs that run in the same environment, and you'd like the When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. - + Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. - + **Overriding environment variables at the personal level** @@ -61,11 +61,11 @@ Every job runs in a specific, deployment environment, and by default, a job will You can also set a personal value override for an environment variable when you develop in the dbt integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." - + To supply an override, developers can edit and specify a different value to use. These values will be respected in the IDE both for the Results and Compiled SQL tabs. - + :::info Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt Cloud does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". @@ -77,7 +77,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -86,7 +86,7 @@ There are some known issues with partial parsing of a project and changing envir While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in the UI, you can prefix the key with `DBT_ENV_SECRET_`. This functionality is supported from `dbt v1.0` and on. - + **Note**: An environment variable can be used to store a [git token for repo cloning](/docs/build/environment-variables#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. @@ -131,7 +131,7 @@ Currently, it's not possible to dynamically set environment variables across mod **Note** — You can also use this method with Databricks SQL Warehouse. - + :::info Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index 65c0792e0a0..a26ac10bd36 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -118,8 +118,8 @@ dbt test -s +exposure:weekly_jaffle_report When we generate our documentation site, you'll see the exposure appear: - - + + ## Related docs diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 3fe194a4cb7..b24d3129f0c 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ Use the `cluster` submission method with dedicated Dataproc clusters you or your - Enable Dataproc APIs for your project + region - If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot) - + The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc) or configure them on specific Python models: - `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. @@ -706,7 +706,7 @@ Google recommends installing Python packages on Dataproc clusters via initializa You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. - + **Docs:** - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index 466bcedc688..e4fb10ac725 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -84,7 +84,7 @@ left join raw.jaffle_shop.customers using (customer_id) Using the `{{ source () }}` function also creates a dependency between the model and the source table. - + ### Testing and documenting sources You can also: @@ -189,7 +189,7 @@ from raw.jaffle_shop.orders The results of this query are used to determine whether the source is fresh or not: - + ### Filter diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index a0dd174278b..d33e4798974 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -254,7 +254,7 @@ create view analytics.customers as ( dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). - + * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 93bbf83584f..bc4a515112d 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -22,7 +22,7 @@ import MSCallout from '/snippets/_microsoft-adapters-soon.md'; You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. - + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) diff --git a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md index 0186d821a54..eecf0a8e229 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md @@ -36,4 +36,4 @@ HTTP and Thrift connection methods: | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md index 032246ad16a..ebf6be63bd1 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md @@ -37,4 +37,4 @@ To set up the Databricks connection, supply the following fields: | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index c265529fb49..9193a890ed3 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -27,7 +27,7 @@ username (specifically, the `login_name`) and the corresponding user's Snowflake to authenticate dbt Cloud to run queries against Snowflake on behalf of a Snowflake user. **Note**: The schema field in the **Developer Credentials** section is a required field. - + ### Key Pair @@ -68,7 +68,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private The OAuth auth method permits dbt Cloud to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt Cloud. For more information on configuring a Snowflake OAuth connection in dbt Cloud, please see [the docs on setting up Snowflake OAuth](/docs/cloud/manage-access/set-up-snowflake-oauth). - + ## Configuration diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 7ea6e380000..2e637b7450a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -32,7 +32,7 @@ In addition to these fields, there are two other optional fields that can be con - + ### BigQuery OAuth **Available in:** Development environments, Enterprise plans only @@ -43,7 +43,7 @@ more information on the initial configuration of a BigQuery OAuth connection in [the docs on setting up BigQuery OAuth](/docs/cloud/manage-access/set-up-bigquery-oauth). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your personal Profile in dbt Cloud, like so: - + ## Configuration diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index 42028bf993b..bbb2cff8b29 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -16,11 +16,11 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: 1. Click the gear icon at the top right and select **Profile settings**. 2. Click **Linked Accounts**. 3. Next to Azure DevOps, click **Link**. - + 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. - + You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index ff0f2fff18f..715f23912e5 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -30,13 +30,13 @@ To connect your dbt Cloud account to your GitHub account: 2. Select **Linked Accounts** from the left menu. - + 3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. 4. Select the GitHub organization and repositories dbt Cloud should access. - + 5. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata @@ -52,7 +52,7 @@ To connect your dbt Cloud account to your GitHub account: ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. - + ## Personally authenticate with GitHub @@ -70,7 +70,7 @@ To connect a personal GitHub account: 2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". 3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. - + 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index e55552e2d86..316e6af0135 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -22,11 +22,11 @@ To connect your GitLab account: 2. Select **Linked Accounts** in the left menu. 3. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and you'll see that your account has been linked to your profile. @@ -52,7 +52,7 @@ For more detail, GitLab has a [guide for creating a Group Application](https://d In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. - + In GitLab, when creating your Group Application, input the following: @@ -67,7 +67,7 @@ Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cl The application form in GitLab should look as follows when completed: - + Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. @@ -76,7 +76,7 @@ If you're a Business Critical customer using [IP restrictions](/docs/cloud/secur ### Adding the GitLab OAuth application to dbt Cloud After you've created your GitLab application, you need to provide dbt Cloud information about the app. In dbt Cloud, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. - + In dbt Cloud, input the following values: @@ -92,7 +92,7 @@ Once the form is complete in dbt Cloud, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). @@ -103,7 +103,7 @@ To connect a personal GitLab account, dbt Cloud developers should navigate to Yo If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. - + Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 83846bb1f0b..2ccaba1ec4d 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -37,7 +37,7 @@ If you use GitHub, you can import your repo directly using [dbt Cloud's GitHub A - After adding this key, dbt Cloud will be able to read and write files in your dbt project. - Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) - + ## GitLab @@ -52,7 +52,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. - Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) - + ## BitBucket @@ -60,7 +60,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - Next, click the **Add key** button and paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your BitBucket repository. - + ## AWS CodeCommit @@ -109,17 +109,17 @@ If you use Azure DevOps and you are on the dbt Cloud Enterprise plan, you can im 2. We recommend using a dedicated service user for the integration to ensure that dbt Cloud's connection to Azure DevOps is not interrupted by changes to user permissions. - + 3. Next, click the **+ New Key** button to create a new SSH key for the repository. - + 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt Cloud for your repository. 5. After saving this SSH key, dbt Cloud will be able to read and write files in your Azure DevOps repository. - + ## Other git providers diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index 843371be6ea..b24ec577935 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -34,11 +34,11 @@ Many customers ask why they need to select Multitenant instead of Single tenant, 6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 7. Click **Register**. - + Here's what your app should look like before registering it: - + ## Add permissions to your new app @@ -51,7 +51,7 @@ Provide your new app access to Azure DevOps: 4. Select **Azure DevOps**. 5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - + ## Add another redirect URI @@ -63,7 +63,7 @@ You also need to add another redirect URI to your Azure AD application. This red `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. - + @@ -77,7 +77,7 @@ If you have already connected your Azure DevOps account to Active Directory, the 4. Select the directory you want to connect. 5. Click **Connect**. - + ## Add your Azure AD app to dbt Cloud @@ -91,7 +91,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo - **Application (client) ID:** Found in the Azure AD App. - **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation. - **Directory(tenant) ID:** Found in the Azure AD App. - + Your Azure AD app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). @@ -345,7 +345,7 @@ To connect the service user: 2. The admin should click **Link Azure Service User** in dbt Cloud. 3. The admin will be directed to Azure DevOps and must accept the Azure AD app's permissions. 4. Finally, the admin will be redirected to dbt Cloud, and the service user will be connected. - + Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index a40bb006d06..610c97e8b74 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -17,11 +17,11 @@ If you have not yet configured SSO in dbt Cloud, refer instead to our setup guid The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. - + Alternatively, you can start the process from the **Settings** page in the **Single Sign-on** pane. Click the **Begin Migration** button to start. - + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). @@ -48,15 +48,15 @@ Below are sample steps to update. You must complete all of them to ensure uninte Here is an example of an updated SAML 2.0 setup in Okta. - + 2. Save the configuration, and your SAML settings will look something like this: - + 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ - + 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. @@ -68,17 +68,17 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - + 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - + 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. - + 4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. @@ -88,7 +88,7 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + ## Azure Active Directory @@ -98,15 +98,15 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Click **App Registrations** on the left side menu. - + 2. Select the proper **dbt Cloud** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** - + 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. - + 4. Navigate to the dbt Cloud environment and open the **Account Settings**. Click the **Single Sign-on** option from the left side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Azure AD. Toggle the **Enable New SSO Authentication** option and **Save**. _Once this option is enabled, it cannot be undone._ @@ -116,4 +116,4 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 63786f40bd8..76e16039ae8 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -130,8 +130,7 @@ to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. - + ### Mapped configuration @@ -149,7 +148,7 @@ license. To assign Read-Only licenses to certain groups of users, create a new License Mapping for the Read-Only license type and include a comma separated list of IdP group names that should receive a Read-Only license at sign-in time. - Usage notes: diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index dcacda20deb..ac2d6258819 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -28,11 +28,11 @@ Role-Based Access Control (RBAC) is helpful for automatically assigning permissi 1. Click the gear icon to the top right and select **Account Settings**. From the **Team** section, click **Groups** - + 1. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. 2. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. 3. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - + 4. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index 21be7010a30..f79daebf45e 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -20,11 +20,11 @@ You must have proper permissions to invite new users: 1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. 2. From the left sidebar, select **Users**. - + 3. Click on **Invite Users**. - + 4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. @@ -40,7 +40,7 @@ dbt Cloud generates and sends emails from `support@getdbt.com` to the specified The email contains a link to create an account. When the user clicks on this they will be brought to one of two screens depending on whether SSO is configured or not. - + @@ -48,7 +48,7 @@ The email contains a link to create an account. When the user clicks on this the The default settings send the email, the user clicks the link, and is prompted to create their account: - + @@ -56,7 +56,7 @@ The default settings send the email, the user clicks the link, and is prompted t If SSO is configured for the environment, the user clicks the link, is brought to a confirmation screen, and presented with a link to authenticate against the company's identity provider: - + @@ -73,4 +73,4 @@ Once the user completes this process, their email and user information will popu * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index 87018b14d56..b0930af16f7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -28,7 +28,7 @@ To get started, you need to create a client ID and secret for [authentication](h In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: - + On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. @@ -46,7 +46,7 @@ Fill in the application, replacing `YOUR_ACCESS_URL` with the [appropriate Acces Then click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. - + @@ -59,7 +59,7 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client - add the client ID and secret from the BigQuery OAuth app under the **OAuth2.0 Settings** section - + ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: @@ -68,10 +68,10 @@ Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud us - Select **Credentials**. - choose your project from the list - select **Authenticate BigQuery Account** - + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. - + Select **Allow**. This redirects you back to dbt Cloud. You should now be an authenticated BigQuery user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index 679133b7844..8dcbb42ffa7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -60,7 +60,7 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie - select **Connection** to edit the connection details - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section - + ### Authenticating to Databricks (dbt Cloud IDE developer) @@ -72,6 +72,6 @@ Once the Databricks connection via OAuth is set up for a dbt Cloud project, each - Select `OAuth` as the authentication method, and click **Save** - Finalize by clicking the **Connect Databricks Account** button - + You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt Cloud. You should now be an authenticated Databricks user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index 5b9abb6058a..8e38a60dd27 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -68,7 +68,7 @@ from Enter the Client ID and Client Secret into dbt Cloud to complete the creation of your Connection. - + ### Authorize Developer Credentials @@ -76,7 +76,7 @@ Once Snowflake SSO is enabled, users on the project will be able to configure th ### SSO OAuth Flow Diagram - + Once a user has authorized dbt Cloud with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt Cloud application. dbt Cloud is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the dbt Cloud IDE on behalf of users. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 19779baf615..1e45de190f5 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -52,7 +52,7 @@ Client Secret for use in dbt Cloud. | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | - + 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. @@ -65,7 +65,7 @@ Client Secret for use in dbt Cloud. | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | - + 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! @@ -77,7 +77,7 @@ Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. - + ## Configuration in dbt Cloud @@ -99,7 +99,7 @@ Settings. Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. The `LOGIN-SLUG` must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. - + 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt Cloud with your work email address. If authentication is successful, you will be @@ -109,7 +109,7 @@ Settings. you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. - + If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index ba925fa2c24..79c33a28450 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -426,7 +426,7 @@ To complete setup, follow the steps below in dbt Cloud: | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | Slug | Enter your desired login slug. | - 4. Click **Save** to complete setup for the SAML 2.0 integration. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index b4954955c8c..938587d59b3 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -24,7 +24,7 @@ Once you configure SSO, even partially, you cannot disable or revert it. When yo The diagram below explains the basic process by which users are provisioned in dbt Cloud upon logging in with SSO. - + #### Diagram Explanation diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index 034b3a6c144..a0206ca038d 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -71,6 +71,6 @@ Once you are done adding all your ranges, IP restrictions can be enabled by sele Once enabled, when someone attempts to access dbt Cloud from a restricted IP, they will encounter one of the following messages depending on whether they use email & password or SSO login. - + - + diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index c42c703556b..da5312876fb 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -23,17 +23,17 @@ While Redshift Serverless does support Redshift-managed type VPC endpoints, this 1. On the running Redshift cluster, select the **Properties** tab. - + 2. In the **Granted accounts** section, click **Grant access**. - + 3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. - + 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): @@ -62,14 +62,14 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Standard Redshift** - Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. - + - There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). - **Redshift Serverless** - To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. - + - From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. @@ -85,13 +85,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index dd046259e4e..bc8f30a5566 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -27,7 +27,7 @@ Users connecting to Snowflake using SSO over a PrivateLink connection from dbt C - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. - + 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index e104ea8640c..7e85cbb8b11 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index 1a989806851..b6636a84eee 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -29,7 +29,7 @@ Importantly, dbt also provides a way to add **descriptions** to models, columns, Here's an example docs site: - + ## Adding descriptions to your project To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: @@ -177,17 +177,17 @@ up to page views and sessions. ## Navigating the documentation site Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this: - + Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring. - + In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](/reference/node-selection/syntax). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. - + ## Deploying the documentation site diff --git a/website/docs/docs/collaborate/git/managed-repository.md b/website/docs/docs/collaborate/git/managed-repository.md index db8e9840ccd..6112b84d4c6 100644 --- a/website/docs/docs/collaborate/git/managed-repository.md +++ b/website/docs/docs/collaborate/git/managed-repository.md @@ -13,7 +13,7 @@ To set up a project with a managed repository: 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. - + dbt Cloud will host and manage this repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt Cloud at any time. diff --git a/website/docs/docs/collaborate/git/merge-conflicts.md b/website/docs/docs/collaborate/git/merge-conflicts.md index c3c19b1e2a1..133a096da9c 100644 --- a/website/docs/docs/collaborate/git/merge-conflicts.md +++ b/website/docs/docs/collaborate/git/merge-conflicts.md @@ -35,9 +35,9 @@ The dbt Cloud IDE will display: - The file name colored in red in the **Changes** section, with a warning icon. - If you press commit without resolving the conflict, the dbt Cloud IDE will prompt a pop up box with a list which files need to be resolved. - + - + ## Resolve merge conflicts @@ -51,7 +51,7 @@ You can seamlessly resolve merge conflicts that involve competing line changes i 6. Repeat this process for every file that has a merge conflict. - + :::info Edit conflict files - If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
@@ -67,6 +67,6 @@ When you've resolved all the merge conflicts, the last step would be to commit t 3. The dbt Cloud IDE will return to its normal state and you can continue developing! - + - + diff --git a/website/docs/docs/collaborate/git/pr-template.md b/website/docs/docs/collaborate/git/pr-template.md index ddb4948dad9..b85aa8a0d51 100644 --- a/website/docs/docs/collaborate/git/pr-template.md +++ b/website/docs/docs/collaborate/git/pr-template.md @@ -9,7 +9,7 @@ open a new Pull Request for the code changes. To enable this functionality, ensu that a PR Template URL is configured in the Repository details page in your Account Settings. If this setting is blank, the IDE will prompt users to merge the changes directly into their default branch. - + ### PR Template URL by git provider diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index 7ef675b4e1e..aeb18090751 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -27,7 +27,7 @@ Each data point links to individual models in Explorer. You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index b0b5fbd6cfe..a5a8a6c4807 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -110,7 +110,7 @@ On July 18, 2023, dbt Labs made critical infrastructure changes to service accou To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is _on or before_ July 18, 2023. - + 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt Cloud by clicking the **trash can icon**. _Only take this action after the new token is in place to avoid service disruptions_. diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index 77e536b12a5..5734f8ba35a 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -14,7 +14,7 @@ permissions of the user the that they were created for. You can find your User API token in the Profile page under the `API Access` label. - + ## FAQs diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md index 0b588376c34..dc2cdb63748 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md @@ -12,4 +12,4 @@ Previously, when dbt Labs released a new [version](/docs/dbt-versions/core#how-d To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md index 1aabe517076..38b017baa30 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md @@ -16,4 +16,4 @@ Highlights include: - Cleaner look and feel with iconography - Helpful tool tips - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md index d4d299b1d36..0bc4b76d0fc 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md @@ -8,7 +8,7 @@ tags: [May-2023, Scheduler] New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. - + Highlights include: diff --git a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md index bdc89b4abde..9ceda7749cd 100644 --- a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md +++ b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md @@ -10,5 +10,5 @@ To help save compute time, new jobs will no longer be triggered to run by defaul For more information, refer to [Deploy jobs](/docs/deploy/deploy-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md index 2d0488d4488..41e1a5265ca 100644 --- a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md +++ b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md @@ -13,4 +13,4 @@ Large DAGs can take a long time (10 or more seconds, if not minutes) to render a The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md index 307786c6b85..90e6ac72fea 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md @@ -10,4 +10,4 @@ We fixed an issue where a spotty internet connection could cause the “IDE sess We updated the health check logic so it now excludes client-side connectivity issues from the IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md index 9ff5986b4da..46c1f4bbd15 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md @@ -9,4 +9,4 @@ tags: [v1.1.46, March-02-2022] dbt Cloud now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt Cloud waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt Cloud to ready your job to run in your cloud data warehouse. - + diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index 052611f66e6..75697d32d17 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -9,7 +9,7 @@ In dbt Cloud, both jobs and environments are configured to use a specific versio Navigate to the settings page of an environment, then click **edit**. Click the **dbt Version** dropdown bar and make your selection. From this list, you can select an available version of Core to associate with this environment. - + Be sure to save your changes before navigating away. @@ -17,7 +17,7 @@ Be sure to save your changes before navigating away. Each job in dbt Cloud can be configured to inherit parameters from the environment it belongs to. - + The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. @@ -275,7 +275,7 @@ Once you have your project compiling and running on the latest version of dbt in - + Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/artifacts.md b/website/docs/docs/deploy/artifacts.md index 9b3ae71e79c..7ecc05355a0 100644 --- a/website/docs/docs/deploy/artifacts.md +++ b/website/docs/docs/deploy/artifacts.md @@ -10,11 +10,11 @@ When running dbt jobs, dbt Cloud generates and saves *artifacts*. You can use th While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection in the **Project details** page. To access this page, click the gear icon in the upper right, select **Account Settings**, select your project, and click **Edit** in the lower right. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. - + If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. - + When you add a production job to a project, dbt Cloud updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. @@ -25,10 +25,10 @@ When set up, dbt Cloud updates the **Documentation** link in the header tab so i Note that both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation for the project-level documentation to be populated or updated. - + ### Source Freshness As with Documentation, configuring a job for the Source Freshness artifact setting also updates the Data Sources link under **Deploy**. The new link points to the latest Source Freshness report for the selected job. - + diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 149a6951fdc..9f0bafddaef 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -117,10 +117,10 @@ If you're experiencing any issues, review some of the common questions and answe First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- +

Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- +

Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job.
diff --git a/website/docs/docs/deploy/dashboard-status-tiles.md b/website/docs/docs/deploy/dashboard-status-tiles.md index 67aa1a93c33..4da0f859546 100644 --- a/website/docs/docs/deploy/dashboard-status-tiles.md +++ b/website/docs/docs/deploy/dashboard-status-tiles.md @@ -9,11 +9,11 @@ In dbt Cloud, the [Discovery API](/docs/dbt-cloud-apis/discovery-api) can power ## Functionality The dashboard status tile looks like this: - + The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: - + Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. @@ -60,7 +60,7 @@ Looker does not allow you to directly embed HTML and instead requires creating a - Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, jobID, and token relevant to that dashboard. - + ### Tableau Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index cee6e245359..90ba0c7796c 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -80,7 +80,7 @@ dbt Cloud uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordi To fully customize the scheduling of your job, choose the **Custom cron schedule** option and use the cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. - + Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index cca2368f38a..64fcb1dadae 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -19,8 +19,8 @@ If your organization is using [Airflow](https://airflow.apache.org/), there are Installing the [dbt Cloud Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt Cloud jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt Cloud. - - + + @@ -71,7 +71,7 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). - + diff --git a/website/docs/docs/deploy/source-freshness.md b/website/docs/docs/deploy/source-freshness.md index 2f9fe6bc007..3c4866cd084 100644 --- a/website/docs/docs/deploy/source-freshness.md +++ b/website/docs/docs/deploy/source-freshness.md @@ -6,7 +6,7 @@ description: "Validate that data freshness meets expectations and alert if stale dbt Cloud provides a helpful interface around dbt's [source data freshness](/docs/build/sources#snapshotting-source-data-freshness) calculations. When a dbt Cloud job is configured to snapshot source data freshness, dbt Cloud will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. - + ### Enabling source freshness snapshots @@ -15,7 +15,7 @@ dbt Cloud provides a helpful interface around dbt's [source data freshness](/doc - Select the **Generate docs on run** checkbox to automatically [generate project docs](/docs/collaborate/build-and-view-your-docs#set-up-a-documentation-job). - Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. - + To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](/docs/build/sources#snapshotting-source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. @@ -27,7 +27,7 @@ Review the following options and outcomes: | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt Cloud will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | - + ### Source freshness snapshot frequency diff --git a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md index f41bceab12d..e6a50443837 100644 --- a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md +++ b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md @@ -16,11 +16,11 @@ New dbt Cloud accounts will automatically be created with a Development Environm To create a development environment, choose **Deploy** > **Environments** from the top left. Then, click **Create Environment**. - + Enter an environment **Name** that would help you identify it among your other environments (for example, `Nate's Development Environment`). Choose **Development** as the **Environment Type**. You can also select which **dbt Version** to use at this time. For compatibility reasons, we recommend that you select the same dbt version that you plan to use in your deployment environment. Finally, click **Save** to finish creating your development environment. - + ### Setting up developer credentials @@ -28,14 +28,14 @@ The IDE uses *developer credentials* to connect to your database. These develope New dbt Cloud accounts should have developer credentials created automatically as a part of Project creation in the initial application setup. - + New users on existing accounts *might not* have their development credentials already configured. To manage your development credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 2. Select the relevant project in the list. After entering your developer credentials, you'll be able to access the dbt IDE. - + ### Compiling and running SQL diff --git a/website/docs/faqs/API/rotate-token.md b/website/docs/faqs/API/rotate-token.md index 4470de72d5a..0b808fa9176 100644 --- a/website/docs/faqs/API/rotate-token.md +++ b/website/docs/faqs/API/rotate-token.md @@ -19,7 +19,7 @@ To automatically rotate your API key: 2. Select **API Access** from the lefthand side. 3. In the **API** pane, click `Rotate`. - + diff --git a/website/docs/faqs/Accounts/change-users-license.md b/website/docs/faqs/Accounts/change-users-license.md index 8755b946126..ed12ba5dc14 100644 --- a/website/docs/faqs/Accounts/change-users-license.md +++ b/website/docs/faqs/Accounts/change-users-license.md @@ -10,10 +10,10 @@ To change the license type for a user from `developer` to `read-only` or `IT` in 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove, and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. - + diff --git a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md index d16651a944c..ef2ff8e4cd3 100644 --- a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md +++ b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md @@ -32,7 +32,7 @@ To unlock your account and select a plan, review the following guidance per plan 3. Confirm your plan selection on the pop up message. 4. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Developer plan. 🎉 - + ### Team plan @@ -42,7 +42,7 @@ To unlock your account and select a plan, review the following guidance per plan 4. Enter your payment details and click **Save**. 5. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Team plan. 🎉 - + ### Enterprise plan @@ -50,7 +50,7 @@ To unlock your account and select a plan, review the following guidance per plan 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Cloud Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. - + 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt Cloud can help you and your team. diff --git a/website/docs/faqs/Project/delete-a-project.md b/website/docs/faqs/Project/delete-a-project.md index 5fde3fee9cd..21f16cbfaec 100644 --- a/website/docs/faqs/Project/delete-a-project.md +++ b/website/docs/faqs/Project/delete-a-project.md @@ -9,10 +9,10 @@ To delete a project in dbt Cloud, you must be the account owner or have admin pr 1. From dbt Cloud, click the gear icon at the top right corner and select **Account Settings**. - + 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option will appear on the left side of the same details view. 4. Select **Delete**. Confirm the action to immediately delete the user without additional password prompts. There will be no account password prompt, and the project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. - + diff --git a/website/docs/guides/adapter-creation.md b/website/docs/guides/adapter-creation.md index 28e0e8253ad..12bda4726f9 100644 --- a/website/docs/guides/adapter-creation.md +++ b/website/docs/guides/adapter-creation.md @@ -107,7 +107,7 @@ A set of *materializations* and their corresponding helper macros defined in dbt Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works. - + ## Prerequisites @@ -1225,17 +1225,17 @@ This can vary substantially depending on the nature of the release but a good ba Breaking this down: - Visually distinctive announcement - make it clear this is a release - + - Short written description of what is in the release - + - Links to additional resources - + - Implementation instructions: - + - Future plans - + - Contributor recognition (if applicable) - + ## Verify a new adapter diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index 4f461a3cf3a..d961a27018a 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -56,7 +56,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen Click **Run**, then check for results from the queries. For example:
- +
2. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: - **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. @@ -64,7 +64,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. - **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default).
- +
3. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. diff --git a/website/docs/guides/codespace-qs.md b/website/docs/guides/codespace-qs.md index b28b0ddaacf..c399eb494a9 100644 --- a/website/docs/guides/codespace-qs.md +++ b/website/docs/guides/codespace-qs.md @@ -35,7 +35,7 @@ dbt Labs provides a [GitHub Codespace](https://docs.github.com/en/codespaces/ove 1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: - + When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index 1778098f752..b21fe13b19b 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -144,7 +144,7 @@ In Azure: - Click *OK* and then *Save* to save the variable - Save your new Azure pipeline - + diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index cb01daec394..98c215382f6 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -41,7 +41,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 1. Use your existing account or sign up for a Databricks account at [Try Databricks](https://databricks.com/). Complete the form with your user information.
- +
2. For the purpose of this tutorial, you will be selecting AWS as our cloud provider but if you use Azure or GCP internally, please choose one of them. The setup process will be similar. @@ -49,28 +49,28 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 4. After setting up your password, you will be guided to choose a subscription plan. Select the `Premium` or `Enterprise` plan to access the SQL Compute functionality required for using the SQL warehouse for dbt. We have chosen `Premium` for this tutorial. Click **Continue** after selecting your plan.
- +
5. Click **Get Started** when you come to this below page and then **Confirm** after you validate that you have everything needed.
- +
- +
6. Now it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, etc into one place. Provide the name of your workspace and choose the appropriate AWS region and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for the purpose of this tutorial.
- +
7. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this.
- +
:::tip @@ -79,16 +79,16 @@ If you get a session error and don’t get redirected to this page, you can go b 8. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**.
- +
- +
10. Go back to the Databricks tab. You should see that your workspace is ready to use.
- +
11. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. @@ -101,7 +101,7 @@ If you get a session error and don’t get redirected to this page, you can go b 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space.
- +
3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. @@ -109,12 +109,12 @@ If you get a session error and don’t get redirected to this page, you can go b 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu.
- +
6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI.
- +
7. Update the Table Attributes at the top: @@ -128,7 +128,7 @@ If you get a session error and don’t get redirected to this page, you can go b - LAST_NAME = string
- +
8. Click **Create** on the bottom once you’re done. @@ -136,11 +136,11 @@ If you get a session error and don’t get redirected to this page, you can go b 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`.
- +
- +
10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. @@ -153,7 +153,7 @@ If you get a session error and don’t get redirected to this page, you can go b ```
- +
12. To ensure any users who might be working on your dbt project has access to your object, run this command. diff --git a/website/docs/guides/dbt-python-snowpark.md b/website/docs/guides/dbt-python-snowpark.md index 110445344e9..fce0ad692f6 100644 --- a/website/docs/guides/dbt-python-snowpark.md +++ b/website/docs/guides/dbt-python-snowpark.md @@ -51,19 +51,19 @@ Overall we are going to set up the environments, build scalable pipelines in dbt 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. - + 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: - + 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. - + - + 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. @@ -80,7 +80,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE_WH** warehouse. - + 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! @@ -233,7 +233,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. - + 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: - Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. @@ -244,7 +244,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. - + 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. @@ -256,7 +256,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt Cloud! - + ## Configure dbt Cloud @@ -264,19 +264,19 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Admin** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. - + 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. - + 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt Cloud trial. - + 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt Cloud registration page. Fill out the form. Make sure to save the password somewhere for login in the future. - + 9. Select **Complete Registration**. You should now be redirected to your dbt Cloud account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. @@ -286,43 +286,43 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, select **Profile Settings** from the gear icon in the upper right. - + 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. - + 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name (`hwatson` is used in the lab screenshots). Be sure to click **Save** for your changes! - + 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt Cloud’s Integrated Development Environment (IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. - + 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and push** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. - + - + 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt Cloud allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. - + 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. - + 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. - + 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC_DBT_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. - + ## Create branch and set up project configs @@ -414,15 +414,15 @@ dbt Labs has developed a [project structure guide](/best-practices/how-we-struct 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. - - + + - If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: - + Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. @@ -742,21 +742,21 @@ The next step is to set up the staging models for each of the 8 source tables. G After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: - + 1. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt Cloud will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. - + 1. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! - + Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! - + Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. @@ -1055,7 +1055,7 @@ By now, we are pretty good at creating new files in the correct directories so w 1. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! - + 1. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. @@ -1128,7 +1128,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? - The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. - + The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. @@ -1142,7 +1142,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? ``` and preview the output: - + Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. @@ -1187,7 +1187,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. - + We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. @@ -1314,7 +1314,7 @@ At a high level we’ll be: - The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! - The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: - + ### Covariate encoding @@ -1565,7 +1565,7 @@ If you haven’t seen code like this before or use joblib files to save machine - Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). - Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: - + 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: @@ -1573,10 +1573,10 @@ If you haven’t seen code like this before or use joblib files to save machine list @modelstage ``` - + 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. - + ### Predicting on new data @@ -1731,7 +1731,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Generic tests 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. - + ```yaml version: 2 @@ -1762,7 +1762,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Using macros for testing 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. - + ```sql {% macro test_all_values_gte_zero(table, column) %} @@ -1776,7 +1776,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. - + 5. Copy the following code into the file and save: @@ -1805,7 +1805,7 @@ These tests are defined in `.sql` files, typically in your `tests` directory (as Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. - + 2. Copy the following code and save the file: @@ -1841,11 +1841,11 @@ Let’s add a custom test that asserts that the moving average of the lap time o dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` - + 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. - + ## Document your dbt project @@ -1865,17 +1865,17 @@ To start, let’s look back at our `intermediate.md` file. We can see that we pr ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. - + 2. Go to our project area and view `int_results`. View the description that we created in our doc block. - + 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). - + 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. Starting in dbt v1.3, we can see how our lineage in our docs looks. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot. - + ## Deploy your code @@ -1890,18 +1890,18 @@ Now that we've completed testing and documenting our work, we're ready to deploy 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge to main**. Select **Merge to main** and the merge process will automatically run in the background. - + 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the menu, select **Deploy** **> Environments** - + 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt Cloud is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. - + 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. @@ -1912,11 +1912,11 @@ Now that we've completed testing and documenting our work, we're ready to deploy So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. - - + + 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. - + ### Conclusion diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md index 378ec857f6a..c8a8c4cf83b 100644 --- a/website/docs/guides/dremio-lakehouse.md +++ b/website/docs/guides/dremio-lakehouse.md @@ -143,7 +143,7 @@ dremioSamples: Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: - + ## About the schema.yml @@ -156,7 +156,7 @@ The models correspond to both weather and trip data respectively and will be joi The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. - + ## About the models @@ -170,11 +170,11 @@ The sources can be found by navigating to the **Object Storage** section of the When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. - + Open the **Application folder** and you will see the output of the simple transformation we did using dbt. - + ## Query the data diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index fcd1e5e9599..082d23bc77e 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -67,7 +67,7 @@ $ pwd 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command.
- +
6. dbt provides the following values in the `dbt_project.yml` file: @@ -126,7 +126,7 @@ $ dbt debug ```
- +
### FAQs @@ -150,7 +150,7 @@ dbt run You should have an output that looks like this:
- +
## Commit your changes @@ -197,7 +197,7 @@ $ git checkout -b add-customers-model 4. From the command line, enter `dbt run`.
- +
When you return to the BigQuery console, you can `select` from this model. diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index c81a4d247a5..5f3395acb82 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -43,17 +43,17 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE_IN_PROGRESS status. - + 4. When the stack status changes to CREATE_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. - + 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. - + 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. @@ -63,9 +63,9 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **User name** — `dbtadmin` - **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. - + - + 9. Click **Create connection**. @@ -80,15 +80,15 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. - + 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. - + 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. @@ -171,7 +171,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Port** — `5439` - **Database** — `dbtworkshop`.
- +
5. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: @@ -179,7 +179,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Password** — This is the autogenerated password that you used earlier in the guide - **Schema** — dbt Cloud automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
- +
6. Click **Test Connection**. This verifies that dbt Cloud can access your Redshift cluster. diff --git a/website/docs/guides/refactoring-legacy-sql.md b/website/docs/guides/refactoring-legacy-sql.md index a339e523020..b12baac95cd 100644 --- a/website/docs/guides/refactoring-legacy-sql.md +++ b/website/docs/guides/refactoring-legacy-sql.md @@ -44,7 +44,7 @@ While refactoring you'll be **moving around** a lot of logic, but ideally you wo To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. - + Once you've copied it over, you'll want to `dbt run` to execute the query and populate the in your warehouse. @@ -76,7 +76,7 @@ If you're migrating multiple stored procedures into dbt, with sources you can se This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. - + #### Build the habit of analytics-as-code Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index 89d7c5a14fa..aa4811d9339 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -22,7 +22,7 @@ After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: - + ### Prerequisites @@ -309,7 +309,7 @@ The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing t In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: - + ### Advanced prerequisites @@ -323,7 +323,7 @@ As noted above, this branch will outlive any individual feature, and will be the See [Custom branch behavior](/docs/dbt-cloud-environments#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. - + ### 3. Create a new QA environment diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 0401c37871f..492609c9bcf 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -143,35 +143,35 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 1. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Admin**. Then, select **Partner Connect**. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt. - + If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. - + 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. - + - + 3. Click **Activate** when a popup appears: - + - + 4. After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. - + 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt Cloud automatically. 6. From your **Account Settings** in dbt Cloud (using the gear menu in the upper right corner), choose the "Partner Connect Trial" project and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. - + - +
@@ -181,7 +181,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 2. Enter a project name and click **Continue**. 3. For the warehouse, click **Snowflake** then **Next** to set up your connection. - + 4. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs. @@ -192,7 +192,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. - + 5. Enter your **Development Credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. @@ -201,7 +201,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently. - + 6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account. 7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials. diff --git a/website/docs/guides/starburst-galaxy-qs.md b/website/docs/guides/starburst-galaxy-qs.md index 1822c83fa90..9a6c44574cd 100644 --- a/website/docs/guides/starburst-galaxy-qs.md +++ b/website/docs/guides/starburst-galaxy-qs.md @@ -92,11 +92,11 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: - + 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. - + 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. @@ -113,7 +113,7 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To When done, click **Add privileges**. - + ## Create tables with Starburst Galaxy To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). @@ -121,7 +121,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: - + 3. Copy and paste these queries into the query editor. Then **Run** each query individually. @@ -181,7 +181,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: - + 5. Verify that the tables were created successfully. In the query editor, run the following queries: diff --git a/website/docs/reference/node-selection/graph-operators.md b/website/docs/reference/node-selection/graph-operators.md index 8cba43e1b52..88d99d7b92a 100644 --- a/website/docs/reference/node-selection/graph-operators.md +++ b/website/docs/reference/node-selection/graph-operators.md @@ -29,7 +29,7 @@ dbt run --select "3+my_model+4" # select my_model, its parents up to the ### The "at" operator The `@` operator is similar to `+`, but will also include _the parents of the children of the selected model_. This is useful in continuous integration environments where you want to build a model and all of its children, but the _parents_ of those children might not exist in the database yet. The selector `@snowplow_web_page_context` will build all three models shown in the diagram below. - + ```bash dbt run --models @my_model # select my_model, its children, and the parents of its children diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 8f323bc4236..a5198fd3487 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -379,7 +379,7 @@ models: - + ### Specifying tags BigQuery table and view *tags* can be created by supplying an empty string for the label value. diff --git a/website/docs/reference/resource-configs/persist_docs.md b/website/docs/reference/resource-configs/persist_docs.md index 15b1e0bdb40..481f25d4e95 100644 --- a/website/docs/reference/resource-configs/persist_docs.md +++ b/website/docs/reference/resource-configs/persist_docs.md @@ -186,8 +186,8 @@ models: Run dbt and observe that the created relation and columns are annotated with your descriptions: - - diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index ce3b317f0f1..5c32fa5fc83 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -104,7 +104,7 @@ If no `partition_by` is specified, then the `insert_overwrite` strategy will ato - This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). - If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). - + + If mixing images and text together, also consider using a docs block. diff --git a/website/docs/terms/dag.md b/website/docs/terms/dag.md index c6b91300bfc..b108c68806a 100644 --- a/website/docs/terms/dag.md +++ b/website/docs/terms/dag.md @@ -32,7 +32,7 @@ One of the great things about DAGs is that they are *visual*. You can clearly id Take this mini-DAG for an example: - + What can you learn from this DAG? Immediately, you may notice a handful of things: @@ -57,7 +57,7 @@ You can additionally use your DAG to help identify bottlenecks, long-running dat ...to name just a few. Understanding the factors impacting model performance can help you decide on [refactoring approaches](https://courses.getdbt.com/courses/refactoring-sql-for-modularity), [changing model materialization](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model#attempt-2-moving-to-an-incremental-model)s, replacing multiple joins with surrogate keys, or other methods. - + ### Modular data modeling best practices @@ -83,7 +83,7 @@ The marketing team at dbt Labs would be upset with us if we told you we think db Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project. - + The DAG is also [available in the dbt Cloud IDE](https://www.getdbt.com/blog/on-dags-hierarchies-and-ides/), so you and your team can refer to your lineage while you build your models. diff --git a/website/docs/terms/data-lineage.md b/website/docs/terms/data-lineage.md index d0162c35616..163047187ba 100644 --- a/website/docs/terms/data-lineage.md +++ b/website/docs/terms/data-lineage.md @@ -69,7 +69,7 @@ Your is used to visually show upstream dependencies, the nodes Ultimately, DAGs are an effective way to see relationships between data sources, models, and dashboards. DAGs are also a great way to see visual bottlenecks, or inefficiencies in your data work (see image below for a DAG with...many bottlenecks). Data teams can additionally add [meta fields](https://docs.getdbt.com/reference/resource-configs/meta) and documentation to nodes in the DAG to add an additional layer of governance to their dbt project. - + :::tip Automatic > Manual diff --git a/website/snippets/quickstarts/intro-build-models-atop-other-models.md b/website/snippets/quickstarts/intro-build-models-atop-other-models.md index 1104461079b..eeedec34892 100644 --- a/website/snippets/quickstarts/intro-build-models-atop-other-models.md +++ b/website/snippets/quickstarts/intro-build-models-atop-other-models.md @@ -2,4 +2,4 @@ As a best practice in SQL, you should separate logic that cleans up your data fr Now you can experiment by separating the logic out into separate models and using the [ref](/reference/dbt-jinja-functions/ref) function to build models on top of other models: - + From e9e2791839b554d4a4015fb6e80d9063d0bd0d2d Mon Sep 17 00:00:00 2001 From: adrianbr Date: Mon, 15 Jan 2024 12:23:17 +0100 Subject: [PATCH 32/56] Fix date in article because it affects sort order --- ...-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md b/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md index 7e63b6e1c6d..d2c6652d883 100644 --- a/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md +++ b/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md @@ -7,7 +7,7 @@ authors: [euan_johnston] hide_table_of_contents: false -date: 2023-01-15 +date: 2024-01-15 is_featured: false --- From 788976d87370cbee5dd614dea36c5e9bf7e4e6cf Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 15 Jan 2024 16:49:18 +0000 Subject: [PATCH 33/56] remove width attribute and add max width --- ...022-11-22-move-spreadsheets-to-your-dwh.md | 10 +- .../blog/2022-11-30-dbt-project-evaluator.md | 4 +- .../blog/2023-01-17-grouping-data-tests.md | 4 +- ...01-ingestion-time-partitioning-bigquery.md | 2 +- website/blog/2023-03-23-audit-helper.md | 16 +-- ...ng-a-kimball-dimensional-model-with-dbt.md | 4 +- ...23-04-24-framework-refactor-alteryx-dbt.md | 10 +- ...odeling-ragged-time-varying-hierarchies.md | 2 +- .../2023-05-04-generating-dynamic-docs.md | 10 +- website/blog/2023-07-17-GPT-and-dbt-test.md | 6 +- ...023-08-01-announcing-materialized-views.md | 2 +- .../blog/2024-01-09-defer-in-development.md | 4 +- .../dbt-unity-catalog-best-practices.md | 4 +- .../docs/docs/build/custom-target-names.md | 4 +- website/docs/docs/build/data-tests.md | 2 +- .../docs/docs/build/environment-variables.md | 20 ++-- website/docs/docs/build/exposures.md | 4 +- website/docs/docs/build/python-models.md | 4 +- website/docs/docs/build/sources.md | 4 +- website/docs/docs/build/sql-models.md | 2 +- .../about-connections.md | 2 +- .../connect-apache-spark.md | 2 +- .../connect-databricks.md | 2 +- .../connect-snowflake.md | 4 +- .../connnect-bigquery.md | 4 +- .../docs/docs/cloud/git/authenticate-azure.md | 4 +- website/docs/docs/cloud/git/connect-github.md | 8 +- website/docs/docs/cloud/git/connect-gitlab.md | 14 +-- .../cloud/git/import-a-project-by-git-url.md | 12 +- website/docs/docs/cloud/git/setup-azure.md | 14 +-- .../cloud/manage-access/auth0-migration.md | 26 ++--- .../manage-access/cloud-seats-and-users.md | 4 +- .../manage-access/enterprise-permissions.md | 4 +- .../docs/cloud/manage-access/invite-users.md | 12 +- .../manage-access/set-up-bigquery-oauth.md | 10 +- .../manage-access/set-up-databricks-oauth.md | 4 +- .../manage-access/set-up-snowflake-oauth.md | 4 +- .../set-up-sso-google-workspace.md | 10 +- .../manage-access/set-up-sso-saml-2.0.md | 2 +- .../docs/cloud/manage-access/sso-overview.md | 2 +- .../docs/docs/cloud/secure/ip-restrictions.md | 4 +- .../docs/cloud/secure/redshift-privatelink.md | 14 +-- .../cloud/secure/snowflake-privatelink.md | 2 +- .../cloud-build-and-view-your-docs.md | 6 +- .../docs/docs/collaborate/documentation.md | 8 +- .../collaborate/git/managed-repository.md | 2 +- .../docs/collaborate/git/merge-conflicts.md | 10 +- .../docs/docs/collaborate/git/pr-template.md | 2 +- .../docs/collaborate/model-performance.md | 4 +- .../docs/dbt-cloud-apis/service-tokens.md | 2 +- .../docs/docs/dbt-cloud-apis/user-tokens.md | 2 +- .../removing-prerelease-versions.md | 2 +- .../run-details-and-logs-improvements.md | 2 +- .../81-May-2023/run-history-improvements.md | 2 +- .../86-Dec-2022/new-jobs-default-as-off.md | 2 +- .../92-July-2022/render-lineage-feature.md | 2 +- .../95-March-2022/ide-timeout-message.md | 2 +- .../95-March-2022/prep-and-waiting-time.md | 2 +- .../dbt-versions/upgrade-core-in-cloud.md | 6 +- website/docs/docs/deploy/artifacts.md | 8 +- website/docs/docs/deploy/ci-jobs.md | 4 +- .../docs/deploy/dashboard-status-tiles.md | 12 +- website/docs/docs/deploy/deploy-jobs.md | 2 +- website/docs/docs/deploy/deployment-tools.md | 6 +- website/docs/docs/deploy/source-freshness.md | 6 +- .../using-the-dbt-ide.md | 8 +- website/docs/faqs/API/rotate-token.md | 2 +- .../faqs/Accounts/change-users-license.md | 4 +- .../Accounts/cloud-upgrade-instructions.md | 6 +- website/docs/faqs/Git/git-migration.md | 2 +- website/docs/faqs/Project/delete-a-project.md | 4 +- website/docs/guides/adapter-creation.md | 14 +-- website/docs/guides/bigquery-qs.md | 4 +- website/docs/guides/codespace-qs.md | 2 +- website/docs/guides/custom-cicd-pipelines.md | 2 +- website/docs/guides/databricks-qs.md | 32 +++--- website/docs/guides/dbt-python-snowpark.md | 106 +++++++++--------- website/docs/guides/dremio-lakehouse.md | 8 +- website/docs/guides/manual-install-qs.md | 8 +- website/docs/guides/redshift-qs.md | 20 ++-- website/docs/guides/refactoring-legacy-sql.md | 4 +- website/docs/guides/set-up-ci.md | 6 +- website/docs/guides/snowflake-qs.md | 24 ++-- website/docs/guides/starburst-galaxy-qs.md | 10 +- .../node-selection/graph-operators.md | 2 +- .../resource-configs/bigquery-configs.md | 2 +- .../resource-configs/persist_docs.md | 4 +- .../resource-configs/spark-configs.md | 2 +- .../resource-properties/description.md | 2 +- website/docs/terms/dag.md | 6 +- website/docs/terms/data-lineage.md | 2 +- .../intro-build-models-atop-other-models.md | 2 +- .../src/components/lightbox/styles.module.css | 2 +- 93 files changed, 333 insertions(+), 333 deletions(-) diff --git a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md index 09274b41a9b..93cf91efeed 100644 --- a/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md +++ b/website/blog/2022-11-22-move-spreadsheets-to-your-dwh.md @@ -70,9 +70,9 @@ An obvious choice if you have data to load into your warehouse would be your exi [Fivetran’s browser uploader](https://fivetran.com/docs/files/browser-upload) does exactly what it says on the tin: you upload a file to their web portal and it creates a table containing that data in a predefined schema in your warehouse. With a visual interface to modify data types, it’s easy for anyone to use. And with an account type with the permission to only upload files, you don’t need to worry about your stakeholders accidentally breaking anything either. - + - + A nice benefit of the uploader is support for updating data in the table over time. If a file with the same name and same columns is uploaded, any new records will be added, and existing records (per the ) will be updated. @@ -100,7 +100,7 @@ The main benefit of connecting to Google Sheets instead of a static spreadsheet Instead of syncing all cells in a sheet, you create a [named range](https://fivetran.com/docs/files/google-sheets/google-sheets-setup-guide) and connect Fivetran to that range. Each Fivetran connector can only read a single range—if you have multiple tabs then you’ll need to create multiple connectors, each with its own schema and table in the target warehouse. When a sync takes place, it will [truncate](https://docs.getdbt.com/terms/ddl#truncate) and reload the table from scratch as there is no primary key to use for matching. - + Beware of inconsistent data types though—if someone types text into a column that was originally numeric, Fivetran will automatically convert the column to a string type which might cause issues in your downstream transformations. [The recommended workaround](https://fivetran.com/docs/files/google-sheets#typetransformationsandmapping) is to explicitly cast your types in [staging models](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to ensure that any undesirable records are converted to null. @@ -119,7 +119,7 @@ Beware of inconsistent data types though—if someone types text into a column t I’m a big fan of [Fivetran’s Google Drive connector](https://fivetran.com/docs/files/google-drive); in the past I’ve used it to streamline a lot of weekly reporting. It allows stakeholders to use a tool they’re already familiar with (Google Drive) instead of dealing with another set of credentials. Every file uploaded into a specific folder on Drive (or [Box, or consumer Dropbox](https://fivetran.com/docs/files/magic-folder)) turns into a table in your warehouse. - + Like the Google Sheets connector, the data types of the columns are determined automatically. Dates, in particular, are finicky though—if you can control your input data, try to get it into [ISO 8601 format](https://xkcd.com/1179/) to minimize the amount of cleanup you have to do on the other side. @@ -174,7 +174,7 @@ Each of the major data warehouses also has native integrations to import spreads Snowflake’s options are robust and user-friendly, offering both a [web-based loader](https://docs.snowflake.com/en/user-guide/data-load-web-ui.html) as well as [a bulk importer](https://docs.snowflake.com/en/user-guide/data-load-bulk.html). The web loader is suitable for small to medium files (up to 50MB) and can be used for specific files, all files in a folder, or files in a folder that match a given pattern. It’s also the most provider-agnostic, with support for Amazon S3, Google Cloud Storage, Azure and the local file system. - + ### BigQuery diff --git a/website/blog/2022-11-30-dbt-project-evaluator.md b/website/blog/2022-11-30-dbt-project-evaluator.md index b936d4786cd..3ea7a459c35 100644 --- a/website/blog/2022-11-30-dbt-project-evaluator.md +++ b/website/blog/2022-11-30-dbt-project-evaluator.md @@ -20,7 +20,7 @@ If you attended [Coalesce 2022](https://www.youtube.com/watch?v=smbRwmcM1Ok), yo Don’t believe me??? Here’s photographic proof. - + Since the inception of dbt Labs, our team has been embedded with a variety of different data teams — from an over-stretched-data-team-of-one to a data-mesh-multiverse. @@ -120,4 +120,4 @@ If something isn’t working quite right or you have ideas for future functional Together, we can ensure that dbt projects across the galaxy are set up for success as they grow to infinity and beyond. - + diff --git a/website/blog/2023-01-17-grouping-data-tests.md b/website/blog/2023-01-17-grouping-data-tests.md index 3648837302b..23fcce6d27e 100644 --- a/website/blog/2023-01-17-grouping-data-tests.md +++ b/website/blog/2023-01-17-grouping-data-tests.md @@ -43,11 +43,11 @@ So what do we discover when we validate our data by group? Testing for monotonicity, we find many poorly behaved turnstiles. Unlike the well-behaved dark blue line, other turnstiles seem to _decrement_ versus _increment_ with each rotation while still others cyclically increase and plummet to zero – perhaps due to maintenance events, replacements, or glitches in communication with the central server. - + Similarly, while no expected timestamp is missing from the data altogether, a more rigorous test of timestamps _by turnstile_ reveals between roughly 50-100 missing observations for any given period. - + _Check out this [GitHub gist](https://gist.github.com/emilyriederer/4dcc6a05ea53c82db175e15f698a1fb6) to replicate these views locally._ diff --git a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md index 51a62006ee8..99ce142d5ed 100644 --- a/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md +++ b/website/blog/2023-02-01-ingestion-time-partitioning-bigquery.md @@ -125,7 +125,7 @@ In both cases, the operation can be done on a single partition at a time so it r On a 192 GB partition here is how the different methods compare: - + Also, the `SELECT` statement consumed more than 10 hours of slot time while `MERGE` statement took days of slot time. diff --git a/website/blog/2023-03-23-audit-helper.md b/website/blog/2023-03-23-audit-helper.md index 106715c5e4f..8599ad5eb5d 100644 --- a/website/blog/2023-03-23-audit-helper.md +++ b/website/blog/2023-03-23-audit-helper.md @@ -19,7 +19,7 @@ It is common for analytics engineers (AE) and data analysts to have to refactor Not only is that approach time-consuming, but it is also prone to naive assumptions that values match based on aggregate measures (such as counts or sums). To provide a better, more accurate approach to auditing, dbt Labs has created the `audit_helper` package. `audit_helper` is a package for dbt whose main purpose is to audit data by comparing two tables (the original one versus a refactored model). It uses a simple and intuitive query structure that enables quickly comparing tables based on the column values, row amount, and even column types (for example, to make sure that a given column is numeric in both your table and the original one). Figure 1 graphically displays the workflow and where `audit_helper` is positioned in the refactoring process. - + Now that it is clear where the `audit_helper` package is positioned in the refactoring process, it is important to highlight the benefits of using audit_helper (and ultimately, of auditing refactored models). Among the benefits, we can mention: - **Quality assurance**: Assert that a refactored model is reaching the same output as the original model that is being refactored. @@ -57,12 +57,12 @@ According to the `audit_helper` package documentation, this macro comes in handy ### How it works When you run the dbt audit model, it will compare all columns, row by row. To count for the match, every column in a row from one source must exactly match a row from another source, as illustrated in the example in Figure 2 below: - + As shown in the example, the model is compared line by line, and in this case, all lines in both models are equivalent and the result should be 100%. Figure 3 below depicts a row in which two of the three columns are equal and only the last column of row 1 has divergent values. In this case, despite the fact that most of row 1 is identical, that row will not be counted towards the final result. In this example, only row 2 and row 3 are valid, yielding a 66.6% match in the total of analyzed rows. - + As previously stated, for the match to be valid, all column values of a model’s row must be equal to the other model. This is why we sometimes need to exclude columns from the comparison (such as date columns, which can have a time zone difference from the original model to the refactored — we will discuss tips like these below). @@ -103,12 +103,12 @@ Let’s understand the arguments used in the `compare_queries` macro: - `summarize` (optional): This argument allows you to switch between a summary or detailed (verbose) view of the compared data. This argument accepts true or false values (its default is set to be true). 3. Replace the sources from the example with your own - + As illustrated in Figure 4, using the `ref` statements allows you to easily refer to your development model, and using the full path makes it easy to refer to the original table (which will be useful when you are refactoring a SQL Server Stored Procedure or Alteryx Workflow that is already being materialized in the data warehouse). 4. Specify your comparison columns - + Delete the example columns and replace them with the columns of your models, exactly as they are written in each model. You should rename/alias the columns to match, as well as ensuring they are in the same order within the `select` clauses. @@ -129,7 +129,7 @@ Let’s understand the arguments used in the `compare_queries` macro: ``` The output will be the similar to the one shown in Figure 6 below: - +
The output is presented in table format, with each column explained below:
@@ -155,7 +155,7 @@ While we can surely rely on that overview to validate the final refactored model A really useful way to check out which specific columns are driving down the match percentage between tables is the `compare_column_values` macro that allows us to audit column values. This macro requires a column to be set, so it can be used as an anchor to compare entries between the refactored dbt model column and the legacy table column. Figure 7 illustrates how the `compare_column_value`s macro works. - + The macro’s output summarizes the status of column compatibility, breaking it down into different categories: perfect match, both are null, values do not match, value is null in A only, value is null in B only, missing from A and missing from B. This level of detailing makes it simpler for the AE or data analyst to figure out what can be causing incompatibility issues between the models. While refactoring a model, it is common that some keys used to join models are inconsistent, bringing up unwanted null values on the final model as a result, and that would cause the audit row query to fail, without giving much more detail. @@ -224,7 +224,7 @@ Also, we can see that the example code includes a table printing option enabled But unlike from the `compare_queries` macro, if you have kept the printing function enabled, you should expect a table to be printed in the command line when you run the model, as shown in Figure 8. Otherwise, it will be materialized on your data warehouse like this: - + The `compare_column_values` macro separates column auditing results in seven different labels: - **Perfect match**: count of rows (and relative percentage) where the column values compared between both tables are equal and not null; diff --git a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md index 691a7f77571..0aac3d77d53 100644 --- a/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md +++ b/website/blog/2023-04-18-building-a-kimball-dimensional-model-with-dbt.md @@ -39,7 +39,7 @@ Dimensional modeling is a technique introduced by Ralph Kimball in 1996 with his The goal of dimensional modeling is to take raw data and transform it into Fact and Dimension tables that represent the business. - + The benefits of dimensional modeling are: @@ -185,7 +185,7 @@ Now that you’ve set up the dbt project, database, and have taken a peek at the Identifying the business process is done in collaboration with the business user. The business user has context around the business objectives and business processes, and can provide you with that information. - + Upon speaking with the CEO of AdventureWorks, you learn the following information: diff --git a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md index 2c6a9d87591..46cfcb58cdd 100644 --- a/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md +++ b/website/blog/2023-04-24-framework-refactor-alteryx-dbt.md @@ -17,7 +17,7 @@ Alteryx is a visual data transformation platform with a user-friendly interface Transforming data to follow business rules can be a complex task, especially with the increasing amount of data collected by companies. To reduce such complexity, data transformation solutions designed as drag-and-drop tools can be seen as more intuitive, since analysts can visualize the steps taken to transform data. One example of a popular drag-and-drop transformation tool is Alteryx which allows business analysts to transform data by dragging and dropping operators in a canvas. The graphic interface of Alteryx Designer is presented in **Figure 1**. - + Nonetheless, as data workflows become more complex, Alteryx lacks the modularity, documentation, and version control capabilities that these flows require. In this sense, dbt may be a more suitable solution to building resilient and modular data pipelines due to its focus on data modeling. @@ -62,7 +62,7 @@ This blog post reports a consulting project for a major client at Indicium Tech When the client hired Indicium, they had dozens of Alteryx workflows built and running daily solely for the marketing team, which was the focus of the project. For the marketing team, the Alteryx workflows had to be executed in the correct order since they were interdependent, which means one Alteryx workflow used the outcome of the previous one, and so on. The main Alteryx workflows run daily by the marketing team took about 6 hours to run. Another important aspect to consider was that if a model had not finished running when the next one downstream began to run, the data would be incomplete, requiring the workflow to be run again. The execution of all models was usually scheduled to run overnight and by early morning, so the data would be up to date the next day. But if there was an error the night before, the data would be incorrect or out of date. **Figure 3** exemplifies the scheduler. - + Data lineage was a point that added a lot of extra labor because it was difficult to identify which models were dependent on others with so many Alteryx workflows built. When the number of workflows increased, it required a long time to create a view of that lineage in another software. So, if a column's name changed in a model due to a change in the model's source, the marketing analysts would have to map which downstream models were impacted by such change to make the necessary adjustments. Because model lineage was mapped manually, it was a challenge to keep it up to date. @@ -89,7 +89,7 @@ The first step is to validate all data sources and create one com It is essential to click on each data source (the green book icons on the leftmost side of **Figure 5**) and examine whether any transformations have been done inside that data source query. It is very common for a source icon to contain more than one data source or filter, which is why this step is important. The next step is to follow the workflow and transcribe the transformations into SQL queries in the dbt models to replicate the same data transformations as in the Alteryx workflow. - + For this step, we identified which operators were used in the data source (for example, joining data, order columns, group by, etc). Usually the Alteryx operators are pretty self-explanatory and all the information needed for understanding appears on the left side of the menu. We also checked the documentation to understand how each Alteryx operator works behind the scenes. @@ -102,7 +102,7 @@ Auditing large models, with sometimes dozens of columns and millions of rows, ca In this project, we used [the `audit_helper` package](https://github.com/dbt-labs/dbt-audit-helper), because it provides more robust auditing macros and offers more automation possibilities for our use case. To that end, we needed to have both the legacy Alteryx workflow output table and the refactored dbt model materialized in the project’s data warehouse. Then we used the macros available in `audit_helper` to compare query results, data types, column values, row numbers and many more things that are available within the package. For an in-depth explanation and tutorial on how to use the `audit_helper` package, check out [this blog post](https://docs.getdbt.com/blog/audit-helper-for-migration). **Figure 6** graphically illustrates the validation logic behind audit_helper. - + #### Step 4: Duplicate reports and connect them to the dbt refactored models @@ -120,7 +120,7 @@ The conversion proved to be of great value to the client due to three main aspec - Improved workflow visibility: dbt’s support for documentation and testing, associated with dbt Cloud, allows for great visibility of the workflow’s lineage execution, accelerating errors and data inconsistencies identification and troubleshooting. More than once, our team was able to identify the impact of one column’s logic alteration in downstream models much earlier than these Alteryx models. - Workflow simplification: dbt’s modularized approach of data modeling, aside from accelerating total run time of the data workflow, simplified the construction of new tables, based on the already existing modules, and improved code readability. - + As we can see, refactoring Alteryx to dbt was an important step in the direction of data availability, and allowed for much more agile processes for the client’s data team. With less time dedicated to manually executing sequential Alteryx workflows that took hours to complete, and searching for errors in each individual file, the analysts could focus on what they do best: **getting insights from the data and generating value from them**. diff --git a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md index 2b00787cc07..f719bdb40cb 100644 --- a/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md +++ b/website/blog/2023-05-02-modeling-ragged-time-varying-hierarchies.md @@ -22,7 +22,7 @@ To help visualize this data, we're going to pretend we are a company that manufa Obviously, a real bike could have a hundred or more separate components. To keep things simple for this article, let's just consider the bike, the frame, a wheel, the wheel rim, tire, and tube. Our component hierarchy looks like: - + This hierarchy is *ragged* because different paths through the hierarchy terminate at different depths. It is *time-varying* because specific components can be added and removed. diff --git a/website/blog/2023-05-04-generating-dynamic-docs.md b/website/blog/2023-05-04-generating-dynamic-docs.md index b6e8d929e72..1e704178b0a 100644 --- a/website/blog/2023-05-04-generating-dynamic-docs.md +++ b/website/blog/2023-05-04-generating-dynamic-docs.md @@ -35,7 +35,7 @@ This results in a lot of the same columns (e.g. `account_id`) existing in differ In fact, I found a better way using some CLI commands, the dbt Codegen package and docs blocks. I also made the following meme in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community/) channel #memes-and-off-topic-chatter to encapsulate this method: - + ## What pain is being solved? @@ -279,7 +279,7 @@ To confirm the formatting works, run the following command to get dbt Docs up an ``` $ dbt docs && dbt docs serve ``` - + Here, you can confirm that the column descriptions using the doc blocks are working as intended. @@ -326,7 +326,7 @@ user_id ``` Now, open your code editor, and replace `(.*)` with `{% docs column__activity_based_interest__$1 %}\n\n{% enddocs %}\n`, which will result in the following in your markdown file: - + Now you can add documentation to each of your columns. @@ -334,7 +334,7 @@ Now you can add documentation to each of your columns. You can programmatically identify all columns, and have them point towards the newly-created documentation. In your code editor, replace `\s{6}- name: (.*)\n description: ""` with ` - name: $1\n description: "{{ doc('column__activity_based_interest__$1') }}`: - + ⚠️ Some of your columns may already be available in existing docs blocks. In this example, the following replacements are done: - `{{ doc('column__activity_based_interest__user_id') }}` → `{{ doc("column_user_id") }}` @@ -343,7 +343,7 @@ You can programmatically identify all columns, and have them point towards the n ## Check that everything works Run `dbt docs generate`. If there are syntax errors, this will be found out at this stage. If successful, we can run `dbt docs serve` to perform a smoke test and ensure everything looks right: - + ## Additional considerations diff --git a/website/blog/2023-07-17-GPT-and-dbt-test.md b/website/blog/2023-07-17-GPT-and-dbt-test.md index 12e380eb220..84f756919a5 100644 --- a/website/blog/2023-07-17-GPT-and-dbt-test.md +++ b/website/blog/2023-07-17-GPT-and-dbt-test.md @@ -55,7 +55,7 @@ We all know how ChatGPT can digest very complex prompts, but as this is a tool f Opening ChatGPT with GPT4, my first prompt is usually along these lines: - + And the output of this simple prompt is nothing short of amazing: @@ -118,7 +118,7 @@ Back in my day (5 months ago), ChatGPT with GPT 3.5 didn’t have much context o A prompt for it would look something like: - + ## Specify details on generic tests in your prompts @@ -133,7 +133,7 @@ Accepted_values and relationships are slightly trickier but the model can be adj One way of doing this is with a prompt like this: - + Which results in the following output: diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index 6534e1d0b56..eb9716e73a5 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -103,7 +103,7 @@ When we talk about using materialized views in development, the question to thin Outside of the scheduling part, development will be pretty standard. Your pipeline is likely going to look something like this: - + This is assuming you have a near real time pipeline where you are pulling from a streaming data source like a Kafka Topic via an ingestion tool of your choice like Snowpipe for Streaming into your data platform. After your data is in the data platform, you will: diff --git a/website/blog/2024-01-09-defer-in-development.md b/website/blog/2024-01-09-defer-in-development.md index 406b036cab4..96e2ed53f85 100644 --- a/website/blog/2024-01-09-defer-in-development.md +++ b/website/blog/2024-01-09-defer-in-development.md @@ -80,7 +80,7 @@ dbt Cloud offers a seamless deferral experience in both the dbt Cloud IDE and th In the dbt Cloud IDE, there’s as simple toggle switch labeled `Defer to production`. Simply enabling this toggle will defer your command to the production environment when you run any dbt command in the IDE! - + The cloud CLI has this setting *on by default* — there’s nothing else you need to do to set this up! If you prefer not to defer, you can pass the `--no-defer` flag to override this behavior. You can also set an environment other than your production environment as the deferred to environment in your `dbt-cloud` settings in your `dbt_project.yml` : @@ -155,6 +155,6 @@ While defer is a faster and cheaper option for most folks in most situations, de ### Call me Willem Defer - + Defer to prod is a powerful way to improve your development velocity with dbt, and dbt Cloud makes it easier than ever to make use of this feature! You too could look this cool while you’re saving time and money developing on your dbt projects! diff --git a/website/docs/best-practices/dbt-unity-catalog-best-practices.md b/website/docs/best-practices/dbt-unity-catalog-best-practices.md index 5f230263cf8..a55e1d121af 100644 --- a/website/docs/best-practices/dbt-unity-catalog-best-practices.md +++ b/website/docs/best-practices/dbt-unity-catalog-best-practices.md @@ -21,11 +21,11 @@ If you use multiple Databricks workspaces to isolate development from production To do so, use dbt's [environment variable syntax](https://docs.getdbt.com/docs/dbt-cloud/using-dbt-cloud/cloud-environment-variables#special-environment-variables) for Server Hostname of your Databricks workspace URL and HTTP Path for the SQL warehouse in your connection settings. Note that Server Hostname still needs to appear to be a valid domain name to pass validation checks, so you will need to hard-code the domain suffix on the URL, eg `{{env_var('DBT_HOSTNAME')}}.cloud.databricks.com` and the path prefix for your warehouses, eg `/sql/1.0/warehouses/{{env_var('DBT_HTTP_PATH')}}`. - + When you create environments in dbt Cloud, you can assign environment variables to populate the connection information dynamically. Don’t forget to make sure the tokens you use in the credentials for those environments were generated from the associated workspace. - + ## Access Control diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index 4786641678d..ac7036de572 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -21,9 +21,9 @@ where created_at > date_trunc('month', current_date) To set a custom target name for a job in dbt Cloud, configure the **Target Name** field for your job in the Job Settings page. - + ## dbt Cloud IDE When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. - + diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 7c12e5d7059..d981d7e272d 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -245,7 +245,7 @@ Normally, a data test query will calculate failures as part of its execution. If This workflow allows you to query and examine failing records much more quickly in development: - + Note that, if you elect to store test failures: * Test result tables are created in a schema suffixed or named `dbt_test__audit`, by default. It is possible to change this value by setting a `schema` config. (For more details on schema naming, see [using custom schemas](/docs/build/custom-schemas).) diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 14076352ac1..3f2aebd0036 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -17,7 +17,7 @@ Environment variables in dbt Cloud must be prefixed with either `DBT_` or `DBT_E Environment variable values can be set in multiple places within dbt Cloud. As a result, dbt Cloud will interpret environment variables according to the following order of precedence (lowest to highest): - + There are four levels of environment variables: 1. the optional default argument supplied to the `env_var` Jinja function in code @@ -30,7 +30,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -38,7 +38,7 @@ You'll notice there is a `Project Default` column. This is a great place to set To the right of the `Project Default` column are all your environments. Values set at the environment level take priority over the project level default value. This is where you can tell dbt Cloud to interpret an environment value differently in your Staging vs. Production environment, as example. - + @@ -48,12 +48,12 @@ You may have multiple jobs that run in the same environment, and you'd like the When setting up or editing a job, you will see a section where you can override environment variable values defined at the environment or project level. - + Every job runs in a specific, deployment environment, and by default, a job will inherit the values set at the environment level (or the highest precedence level set) for the environment in which it runs. If you'd like to set a different value at the job level, edit the value to override it. - + **Overriding environment variables at the personal level** @@ -61,11 +61,11 @@ Every job runs in a specific, deployment environment, and by default, a job will You can also set a personal value override for an environment variable when you develop in the dbt integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." - + To supply an override, developers can edit and specify a different value to use. These values will be respected in the IDE both for the Results and Compiled SQL tabs. - + :::info Appropriate coverage If you have not set a project level default value for every environment variable, it may be possible that dbt Cloud does not know how to interpret the value of an environment variable in all contexts. In such cases, dbt will throw a compilation error: "Env var required but not provided". @@ -77,7 +77,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -86,7 +86,7 @@ There are some known issues with partial parsing of a project and changing envir While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud has additional capabilities for managing environment variables with secret or otherwise sensitive values. If you want a particular environment variable to be scrubbed from all logs and error messages, in addition to obfuscating the value in the UI, you can prefix the key with `DBT_ENV_SECRET_`. This functionality is supported from `dbt v1.0` and on. - + **Note**: An environment variable can be used to store a [git token for repo cloning](/docs/build/environment-variables#clone-private-packages). We recommend you make the git token's permissions read only and consider using a machine account or service user's PAT with limited repo access in order to practice good security hygiene. @@ -131,7 +131,7 @@ Currently, it's not possible to dynamically set environment variables across mod **Note** — You can also use this method with Databricks SQL Warehouse. - + :::info Environment variables and Snowflake OAuth limitations Env vars works fine with username/password and keypair, including scheduled jobs, because dbt Core consumes the Jinja inserted into the autogenerated `profiles.yml` and resolves it to do an `env_var` lookup. diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index a26ac10bd36..65c0792e0a0 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -118,8 +118,8 @@ dbt test -s +exposure:weekly_jaffle_report When we generate our documentation site, you'll see the exposure appear: - - + + ## Related docs diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index b24d3129f0c..3fe194a4cb7 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -660,7 +660,7 @@ Use the `cluster` submission method with dedicated Dataproc clusters you or your - Enable Dataproc APIs for your project + region - If using the `cluster` submission method: Create or use an existing [Dataproc cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) with the [Spark BigQuery connector initialization action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/connectors#bigquery-connectors). (Google recommends copying the action into your own Cloud Storage bucket, rather than using the example version shown in the screenshot) - + The following configurations are needed to run Python models on Dataproc. You can add these to your [BigQuery profile](/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc) or configure them on specific Python models: - `gcs_bucket`: Storage bucket to which dbt will upload your model's compiled PySpark code. @@ -706,7 +706,7 @@ Google recommends installing Python packages on Dataproc clusters via initializa You can also install packages at cluster creation time by [defining cluster properties](https://cloud.google.com/dataproc/docs/tutorials/python-configuration#image_version_20): `dataproc:pip.packages` or `dataproc:conda.packages`. - + **Docs:** - [Dataproc overview](https://cloud.google.com/dataproc/docs/concepts/overview) diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index e4fb10ac725..466bcedc688 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -84,7 +84,7 @@ left join raw.jaffle_shop.customers using (customer_id) Using the `{{ source () }}` function also creates a dependency between the model and the source table. - + ### Testing and documenting sources You can also: @@ -189,7 +189,7 @@ from raw.jaffle_shop.orders The results of this query are used to determine whether the source is fresh or not: - + ### Filter diff --git a/website/docs/docs/build/sql-models.md b/website/docs/docs/build/sql-models.md index d33e4798974..a0dd174278b 100644 --- a/website/docs/docs/build/sql-models.md +++ b/website/docs/docs/build/sql-models.md @@ -254,7 +254,7 @@ create view analytics.customers as ( dbt uses the `ref` function to: * Determine the order to run the models by creating a dependent acyclic graph (DAG). - + * Manage separate environments — dbt will replace the model specified in the `ref` function with the database name for the (or view). Importantly, this is environment-aware — if you're running dbt with a target schema named `dbt_alice`, it will select from an upstream table in the same schema. Check out the tabs above to see this in action. diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index bc4a515112d..93bbf83584f 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -22,7 +22,7 @@ import MSCallout from '/snippets/_microsoft-adapters-soon.md'; You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. - + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) diff --git a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md index eecf0a8e229..0186d821a54 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-apache-spark.md @@ -36,4 +36,4 @@ HTTP and Thrift connection methods: | Auth | Optional, supply if using Kerberos | `KERBEROS` | | Kerberos Service Name | Optional, supply if using Kerberos | `hive` | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md index ebf6be63bd1..032246ad16a 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-databricks.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-databricks.md @@ -37,4 +37,4 @@ To set up the Databricks connection, supply the following fields: | HTTP Path | The HTTP path of the Databricks cluster or SQL warehouse | /sql/1.0/warehouses/1a23b4596cd7e8fg | | Catalog | Name of Databricks Catalog (optional) | Production | - + diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index 9193a890ed3..c265529fb49 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -27,7 +27,7 @@ username (specifically, the `login_name`) and the corresponding user's Snowflake to authenticate dbt Cloud to run queries against Snowflake on behalf of a Snowflake user. **Note**: The schema field in the **Developer Credentials** section is a required field. - + ### Key Pair @@ -68,7 +68,7 @@ As of dbt version 1.5.0, you can use a `private_key` string in place of `private The OAuth auth method permits dbt Cloud to run development queries on behalf of a Snowflake user without the configuration of Snowflake password in dbt Cloud. For more information on configuring a Snowflake OAuth connection in dbt Cloud, please see [the docs on setting up Snowflake OAuth](/docs/cloud/manage-access/set-up-snowflake-oauth). - + ## Configuration diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 2e637b7450a..7ea6e380000 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -32,7 +32,7 @@ In addition to these fields, there are two other optional fields that can be con - + ### BigQuery OAuth **Available in:** Development environments, Enterprise plans only @@ -43,7 +43,7 @@ more information on the initial configuration of a BigQuery OAuth connection in [the docs on setting up BigQuery OAuth](/docs/cloud/manage-access/set-up-bigquery-oauth). As an end user, if your organization has set up BigQuery OAuth, you can link a project with your personal BigQuery account in your personal Profile in dbt Cloud, like so: - + ## Configuration diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index bbb2cff8b29..42028bf993b 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -16,11 +16,11 @@ Connect your dbt Cloud profile to Azure DevOps using OAuth: 1. Click the gear icon at the top right and select **Profile settings**. 2. Click **Linked Accounts**. 3. Next to Azure DevOps, click **Link**. - + 4. Once you're redirected to Azure DevOps, sign into your account. 5. When you see the permission request screen from Azure DevOps App, click **Accept**. - + You will be directed back to dbt Cloud, and your profile should be linked. You are now ready to develop in dbt Cloud! diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index 715f23912e5..ff0f2fff18f 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -30,13 +30,13 @@ To connect your dbt Cloud account to your GitHub account: 2. Select **Linked Accounts** from the left menu. - + 3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. 4. Select the GitHub organization and repositories dbt Cloud should access. - + 5. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata @@ -52,7 +52,7 @@ To connect your dbt Cloud account to your GitHub account: ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. - + ## Personally authenticate with GitHub @@ -70,7 +70,7 @@ To connect a personal GitHub account: 2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". 3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. - + 4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index 316e6af0135..e55552e2d86 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -22,11 +22,11 @@ To connect your GitLab account: 2. Select **Linked Accounts** in the left menu. 3. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and you'll see that your account has been linked to your profile. @@ -52,7 +52,7 @@ For more detail, GitLab has a [guide for creating a Group Application](https://d In GitLab, navigate to your group settings and select **Applications**. Here you'll see a form to create a new application. - + In GitLab, when creating your Group Application, input the following: @@ -67,7 +67,7 @@ Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cl The application form in GitLab should look as follows when completed: - + Click **Save application** in GitLab, and GitLab will then generate an **Application ID** and **Secret**. These values will be available even if you close the app screen, so this is not the only chance you have to save them. @@ -76,7 +76,7 @@ If you're a Business Critical customer using [IP restrictions](/docs/cloud/secur ### Adding the GitLab OAuth application to dbt Cloud After you've created your GitLab application, you need to provide dbt Cloud information about the app. In dbt Cloud, account admins should navigate to **Account Settings**, click on the **Integrations** tab, and expand the GitLab section. - + In dbt Cloud, input the following values: @@ -92,7 +92,7 @@ Once the form is complete in dbt Cloud, click **Save**. You will then be redirected to GitLab and prompted to sign into your account. GitLab will ask for your explicit authorization: - + Once you've accepted, you should be redirected back to dbt Cloud, and your integration is ready for developers on your team to [personally authenticate with](#personally-authenticating-with-gitlab). @@ -103,7 +103,7 @@ To connect a personal GitLab account, dbt Cloud developers should navigate to Yo If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. - + Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 2ccaba1ec4d..83846bb1f0b 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -37,7 +37,7 @@ If you use GitHub, you can import your repo directly using [dbt Cloud's GitHub A - After adding this key, dbt Cloud will be able to read and write files in your dbt project. - Refer to [Adding a deploy key in GitHub](https://github.blog/2015-06-16-read-only-deploy-keys/) - + ## GitLab @@ -52,7 +52,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. - Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) - + ## BitBucket @@ -60,7 +60,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - Next, click the **Add key** button and paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your BitBucket repository. - + ## AWS CodeCommit @@ -109,17 +109,17 @@ If you use Azure DevOps and you are on the dbt Cloud Enterprise plan, you can im 2. We recommend using a dedicated service user for the integration to ensure that dbt Cloud's connection to Azure DevOps is not interrupted by changes to user permissions. - + 3. Next, click the **+ New Key** button to create a new SSH key for the repository. - + 4. Select a descriptive name for the key and then paste in the deploy key generated by dbt Cloud for your repository. 5. After saving this SSH key, dbt Cloud will be able to read and write files in your Azure DevOps repository. - + ## Other git providers diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index b24ec577935..843371be6ea 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -34,11 +34,11 @@ Many customers ask why they need to select Multitenant instead of Single tenant, 6. Add a redirect URI by selecting **Web** and, in the field, entering `https://YOUR_ACCESS_URL/complete/azure_active_directory`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 7. Click **Register**. - + Here's what your app should look like before registering it: - + ## Add permissions to your new app @@ -51,7 +51,7 @@ Provide your new app access to Azure DevOps: 4. Select **Azure DevOps**. 5. Select the **user_impersonation** permission. This is the only permission available for Azure DevOps. - + ## Add another redirect URI @@ -63,7 +63,7 @@ You also need to add another redirect URI to your Azure AD application. This red `https://YOUR_ACCESS_URL/complete/azure_active_directory_service_user` 4. Click **Save**. - + @@ -77,7 +77,7 @@ If you have already connected your Azure DevOps account to Active Directory, the 4. Select the directory you want to connect. 5. Click **Connect**. - + ## Add your Azure AD app to dbt Cloud @@ -91,7 +91,7 @@ Once you connect your Azure AD app and Azure DevOps, you need to provide dbt Clo - **Application (client) ID:** Found in the Azure AD App. - **Client Secrets:** You need to first create a secret in the Azure AD App under **Client credentials**. Make sure to copy the **Value** field in the Azure AD App and paste it in the **Client Secret** field in dbt Cloud. You are responsible for the Azure AD app secret expiration and rotation. - **Directory(tenant) ID:** Found in the Azure AD App. - + Your Azure AD app should now be added to your dbt Cloud Account. People on your team who want to develop in the dbt Cloud IDE or dbt Cloud CLI can now personally [authorize Azure DevOps from their profiles](/docs/cloud/git/authenticate-azure). @@ -345,7 +345,7 @@ To connect the service user: 2. The admin should click **Link Azure Service User** in dbt Cloud. 3. The admin will be directed to Azure DevOps and must accept the Azure AD app's permissions. 4. Finally, the admin will be redirected to dbt Cloud, and the service user will be connected. - + Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index 610c97e8b74..a40bb006d06 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -17,11 +17,11 @@ If you have not yet configured SSO in dbt Cloud, refer instead to our setup guid The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. - + Alternatively, you can start the process from the **Settings** page in the **Single Sign-on** pane. Click the **Begin Migration** button to start. - + Once you have opted to begin the migration process, the following steps will vary depending on the configured identity provider. You can just skip to the section that's right for your environment. These steps only apply to customers going through the migration; new setups will use the existing [setup instructions](/docs/cloud/manage-access/sso-overview). @@ -48,15 +48,15 @@ Below are sample steps to update. You must complete all of them to ensure uninte Here is an example of an updated SAML 2.0 setup in Okta. - + 2. Save the configuration, and your SAML settings will look something like this: - + 3. Toggle the `Enable new SSO authentication` option to ensure the traffic is routed correctly. _The new SSO migration action is final and cannot be undone_ - + 4. Save the settings and test the new configuration using the SSO login URL provided on the settings page. @@ -68,17 +68,17 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Open the [Google Cloud console](https://console.cloud.google.com/) and select the project with your dbt Cloud single sign-on settings. From the project page **Quick Access**, select **APIs and Services** - + 2. Click **Credentials** from the left side pane and click the appropriate name from **OAuth 2.0 Client IDs** - + 3. In the **Client ID for Web application** window, find the **Authorized Redirect URIs** field and click **Add URI** and enter `https:///login/callback`. Click **Save** once you are done. - + 4. _You will need a person with Google Workspace admin privileges to complete these steps in dbt Cloud_. In dbt Cloud, navigate to the **Account Settings**, click on **Single Sign-on**, and then click **Edit** on the right side of the SSO pane. Toggle the **Enable New SSO Authentication** option and select **Save**. This will trigger an authorization window from Google that will require admin credentials. _The migration action is final and cannot be undone_. Once the authentication has gone through, test the new configuration using the SSO login URL provided on the settings page. @@ -88,7 +88,7 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + ## Azure Active Directory @@ -98,15 +98,15 @@ Below are steps to update. You must complete all of them to ensure uninterrupted 1. Click **App Registrations** on the left side menu. - + 2. Select the proper **dbt Cloud** app (name may vary) from the list. From the app overview, click on the hyperlink next to **Redirect URI** - + 3. In the **Web** pane with **Redirect URIs**, click **Add URI** and enter the appropriate `https:///login/callback`. Save the settings and verify it is counted in the updated app overview. - + 4. Navigate to the dbt Cloud environment and open the **Account Settings**. Click the **Single Sign-on** option from the left side menu and click the **Edit** option from the right side of the SSO pane. The **domain** field is the domain your organization uses to login to Azure AD. Toggle the **Enable New SSO Authentication** option and **Save**. _Once this option is enabled, it cannot be undone._ @@ -116,4 +116,4 @@ You must complete the domain authorization before you toggle `Enable New SSO Aut ::: - + diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 76e16039ae8..adf849c3ba1 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -130,7 +130,7 @@ to allocate for the user. If your account does not have an available license to allocate, you will need to add more licenses to your plan to complete the license change. - + ### Mapped configuration @@ -148,7 +148,7 @@ license. To assign Read-Only licenses to certain groups of users, create a new License Mapping for the Read-Only license type and include a comma separated list of IdP group names that should receive a Read-Only license at sign-in time. - Usage notes: diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index ac2d6258819..dcacda20deb 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -28,11 +28,11 @@ Role-Based Access Control (RBAC) is helpful for automatically assigning permissi 1. Click the gear icon to the top right and select **Account Settings**. From the **Team** section, click **Groups** - + 1. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. 2. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. 3. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - + 4. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index f79daebf45e..21be7010a30 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -20,11 +20,11 @@ You must have proper permissions to invite new users: 1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. 2. From the left sidebar, select **Users**. - + 3. Click on **Invite Users**. - + 4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. @@ -40,7 +40,7 @@ dbt Cloud generates and sends emails from `support@getdbt.com` to the specified The email contains a link to create an account. When the user clicks on this they will be brought to one of two screens depending on whether SSO is configured or not. - + @@ -48,7 +48,7 @@ The email contains a link to create an account. When the user clicks on this the The default settings send the email, the user clicks the link, and is prompted to create their account: - +
@@ -56,7 +56,7 @@ The default settings send the email, the user clicks the link, and is prompted t If SSO is configured for the environment, the user clicks the link, is brought to a confirmation screen, and presented with a link to authenticate against the company's identity provider: - + @@ -73,4 +73,4 @@ Once the user completes this process, their email and user information will popu * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index b0930af16f7..87018b14d56 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -28,7 +28,7 @@ To get started, you need to create a client ID and secret for [authentication](h In the BigQuery console, navigate to **APIs & Services** and select **Credentials**: - + On the **Credentials** page, you can see your existing keys, client IDs, and service accounts. @@ -46,7 +46,7 @@ Fill in the application, replacing `YOUR_ACCESS_URL` with the [appropriate Acces Then click **Create** to create the BigQuery OAuth app and see the app client ID and secret values. These values are available even if you close the app screen, so this isn't the only chance you have to save them. - + @@ -59,7 +59,7 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client - add the client ID and secret from the BigQuery OAuth app under the **OAuth2.0 Settings** section - + ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: @@ -68,10 +68,10 @@ Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud us - Select **Credentials**. - choose your project from the list - select **Authenticate BigQuery Account** - + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. - + Select **Allow**. This redirects you back to dbt Cloud. You should now be an authenticated BigQuery user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index 8dcbb42ffa7..679133b7844 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -60,7 +60,7 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie - select **Connection** to edit the connection details - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section - + ### Authenticating to Databricks (dbt Cloud IDE developer) @@ -72,6 +72,6 @@ Once the Databricks connection via OAuth is set up for a dbt Cloud project, each - Select `OAuth` as the authentication method, and click **Save** - Finalize by clicking the **Connect Databricks Account** button - + You will then be redirected to Databricks and asked to approve the connection. This redirects you back to dbt Cloud. You should now be an authenticated Databricks user, ready to use the dbt Cloud IDE. diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index 8e38a60dd27..5b9abb6058a 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -68,7 +68,7 @@ from Enter the Client ID and Client Secret into dbt Cloud to complete the creation of your Connection. - + ### Authorize Developer Credentials @@ -76,7 +76,7 @@ Once Snowflake SSO is enabled, users on the project will be able to configure th ### SSO OAuth Flow Diagram - + Once a user has authorized dbt Cloud with Snowflake via their identity provider, Snowflake will return a Refresh Token to the dbt Cloud application. dbt Cloud is then able to exchange this refresh token for an Access Token which can then be used to open a Snowflake connection and execute queries in the dbt Cloud IDE on behalf of users. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index 1e45de190f5..19779baf615 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -52,7 +52,7 @@ Client Secret for use in dbt Cloud. | **Authorized domains** | `getdbt.com` (US multi-tenant) `getdbt.com` and `dbt.com`(US Cell 1) `dbt.com` (EMEA or AU) | If deploying into a VPC, use the domain for your deployment | | **Scopes** | `email, profile, openid` | The default scopes are sufficient | - + 6. Save the **Consent screen** settings to navigate back to the **Create OAuth client id** page. @@ -65,7 +65,7 @@ Client Secret for use in dbt Cloud. | **Authorized Javascript origins** | `https://YOUR_ACCESS_URL` | | **Authorized Redirect URIs** | `https://YOUR_AUTH0_URI/login/callback` | - + 8. Press "Create" to create your new credentials. A popup will appear with a **Client ID** and **Client Secret**. Write these down as you will need them later! @@ -77,7 +77,7 @@ Group Membership information from the GSuite API. To enable the Admin SDK for this project, navigate to the [Admin SDK Settings page](https://console.developers.google.com/apis/api/admin.googleapis.com/overview) and ensure that the API is enabled. - + ## Configuration in dbt Cloud @@ -99,7 +99,7 @@ Settings. Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. The `LOGIN-SLUG` must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. - + 3. Click **Save & Authorize** to authorize your credentials. You should be dropped into the GSuite OAuth flow and prompted to log into dbt Cloud with your work email address. If authentication is successful, you will be @@ -109,7 +109,7 @@ Settings. you do not see a `groups` entry in the IdP attribute list, consult the following Troubleshooting steps. - + If the verification information looks appropriate, then you have completed the configuration of GSuite SSO. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index 79c33a28450..ba925fa2c24 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -426,7 +426,7 @@ To complete setup, follow the steps below in dbt Cloud: | Identity Provider Issuer | Paste the **Identity Provider Issuer** shown in the IdP setup instructions | | X.509 Certificate | Paste the **X.509 Certificate** shown in the IdP setup instructions;
**Note:** When the certificate expires, an Idp admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. | | Slug | Enter your desired login slug. | - 4. Click **Save** to complete setup for the SAML 2.0 integration. diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index 938587d59b3..b4954955c8c 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -24,7 +24,7 @@ Once you configure SSO, even partially, you cannot disable or revert it. When yo The diagram below explains the basic process by which users are provisioned in dbt Cloud upon logging in with SSO. - + #### Diagram Explanation diff --git a/website/docs/docs/cloud/secure/ip-restrictions.md b/website/docs/docs/cloud/secure/ip-restrictions.md index a0206ca038d..034b3a6c144 100644 --- a/website/docs/docs/cloud/secure/ip-restrictions.md +++ b/website/docs/docs/cloud/secure/ip-restrictions.md @@ -71,6 +71,6 @@ Once you are done adding all your ranges, IP restrictions can be enabled by sele Once enabled, when someone attempts to access dbt Cloud from a restricted IP, they will encounter one of the following messages depending on whether they use email & password or SSO login. - + - + diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index da5312876fb..c42c703556b 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -23,17 +23,17 @@ While Redshift Serverless does support Redshift-managed type VPC endpoints, this 1. On the running Redshift cluster, select the **Properties** tab. - + 2. In the **Granted accounts** section, click **Grant access**. - + 3. Enter the AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ 4. Choose **Grant access to all VPCs** —or— (optional) contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support) for the appropriate regional VPC ID to designate in the **Grant access to specific VPCs** field. - + 5. Add the required information to the following template, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): @@ -62,14 +62,14 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Standard Redshift** - Use IP addresses from the Redshift cluster’s **Network Interfaces** whenever possible. While IPs listed in the **Node IP addresses** section will work, they are also more likely to change. - + - There will likely be only one Network Interface (NI) to start, but if the cluster fails over to another availability zone (AZ), a new NI will also be created for that AZ. The NI IP from the original AZ will still work, but the new NI IP can also be added to the Target Group. If adding additional IPs, note that the NLB will also need to add the corresponding AZ. Once created, the NI(s) should stay the same (This is our observation from testing, but AWS does not officially document it). - **Redshift Serverless** - To find the IP addresses for Redshift Serverless instance locate and copy the endpoint (only the URL listed before the port) in the Workgroup configuration section of the AWS console for the instance. - + - From a command line run the command `nslookup ` using the endpoint found in the previous step and use the associated IP(s) for the Target Group. @@ -85,13 +85,13 @@ On the provisioned VPC endpoint service, click the **Allow principals** tab. Cli - Principal: `arn:aws:iam::346425330055:role/MTPL_Admin` - + ### 3. Obtain VPC Endpoint Service Name Once the VPC Endpoint Service is provisioned, you can find the service name in the AWS console by navigating to **VPC** → **Endpoint Services** and selecting the appropriate endpoint service. You can copy the service name field value and include it in your communication to dbt Cloud support. - + ### 4. Add the required information to the template below, and submit your request to [dbt Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support): ``` diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index bc8f30a5566..dd046259e4e 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -27,7 +27,7 @@ Users connecting to Snowflake using SSO over a PrivateLink connection from dbt C - AWS account ID: `346425330055` - _NOTE: This account ID only applies to dbt Cloud Multi-Tenant environments. For Virtual Private/Single-Tenant account IDs please contact [Support](https://docs.getdbt.com/community/resources/getting-help#dbt-cloud-support)._ - You will need to have `ACCOUNTADMIN` access to the Snowflake instance to submit a Support request. - + 2. After Snowflake has granted the requested access, run the Snowflake system function [SYSTEM$GET_PRIVATELINK_CONFIG](https://docs.snowflake.com/en/sql-reference/functions/system_get_privatelink_config.html) and copy the output. diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index 7e85cbb8b11..e104ea8640c 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/documentation.md b/website/docs/docs/collaborate/documentation.md index b6636a84eee..1a989806851 100644 --- a/website/docs/docs/collaborate/documentation.md +++ b/website/docs/docs/collaborate/documentation.md @@ -29,7 +29,7 @@ Importantly, dbt also provides a way to add **descriptions** to models, columns, Here's an example docs site: - + ## Adding descriptions to your project To add descriptions to your project, use the `description:` key in the same files where you declare [tests](/docs/build/data-tests), like so: @@ -177,17 +177,17 @@ up to page views and sessions. ## Navigating the documentation site Using the docs interface, you can navigate to the documentation for a specific model. That might look something like this: - + Here, you can see a representation of the project structure, a markdown description for a model, and a list of all of the columns (with documentation) in the model. From a docs page, you can click the green button in the bottom-right corner of the webpage to expand a "mini-map" of your DAG. This pane (shown below) will display the immediate parents and children of the model that you're exploring. - + In this example, the `fct_subscription_transactions` model only has one direct parent. By clicking the "Expand" button in the top-right corner of the window, we can pivot the graph horizontally and view the full lineage for our model. This lineage is filterable using the `--select` and `--exclude` flags, which are consistent with the semantics of [model selection syntax](/reference/node-selection/syntax). Further, you can right-click to interact with the DAG, jump to documentation, or share links to your graph visualization with your coworkers. - + ## Deploying the documentation site diff --git a/website/docs/docs/collaborate/git/managed-repository.md b/website/docs/docs/collaborate/git/managed-repository.md index 6112b84d4c6..db8e9840ccd 100644 --- a/website/docs/docs/collaborate/git/managed-repository.md +++ b/website/docs/docs/collaborate/git/managed-repository.md @@ -13,7 +13,7 @@ To set up a project with a managed repository: 4. Select **Managed**. 5. Enter a name for the repository. For example, "analytics" or "dbt-models." 6. Click **Create**. - + dbt Cloud will host and manage this repository for you. If in the future you choose to host this repository elsewhere, you can export the information from dbt Cloud at any time. diff --git a/website/docs/docs/collaborate/git/merge-conflicts.md b/website/docs/docs/collaborate/git/merge-conflicts.md index 133a096da9c..c3c19b1e2a1 100644 --- a/website/docs/docs/collaborate/git/merge-conflicts.md +++ b/website/docs/docs/collaborate/git/merge-conflicts.md @@ -35,9 +35,9 @@ The dbt Cloud IDE will display: - The file name colored in red in the **Changes** section, with a warning icon. - If you press commit without resolving the conflict, the dbt Cloud IDE will prompt a pop up box with a list which files need to be resolved. - + - + ## Resolve merge conflicts @@ -51,7 +51,7 @@ You can seamlessly resolve merge conflicts that involve competing line changes i 6. Repeat this process for every file that has a merge conflict. - + :::info Edit conflict files - If you open the conflict file under **Changes**, the file name will display something like `model.sql (last commit)` and is fully read-only and cannot be edited.
@@ -67,6 +67,6 @@ When you've resolved all the merge conflicts, the last step would be to commit t 3. The dbt Cloud IDE will return to its normal state and you can continue developing! - + - + diff --git a/website/docs/docs/collaborate/git/pr-template.md b/website/docs/docs/collaborate/git/pr-template.md index b85aa8a0d51..ddb4948dad9 100644 --- a/website/docs/docs/collaborate/git/pr-template.md +++ b/website/docs/docs/collaborate/git/pr-template.md @@ -9,7 +9,7 @@ open a new Pull Request for the code changes. To enable this functionality, ensu that a PR Template URL is configured in the Repository details page in your Account Settings. If this setting is blank, the IDE will prompt users to merge the changes directly into their default branch. - + ### PR Template URL by git provider diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index aeb18090751..7ef675b4e1e 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -27,7 +27,7 @@ Each data point links to individual models in Explorer. You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-cloud-apis/service-tokens.md b/website/docs/docs/dbt-cloud-apis/service-tokens.md index a5a8a6c4807..b0b5fbd6cfe 100644 --- a/website/docs/docs/dbt-cloud-apis/service-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/service-tokens.md @@ -110,7 +110,7 @@ On July 18, 2023, dbt Labs made critical infrastructure changes to service accou To rotate your token: 1. Navigate to **Account settings** and click **Service tokens** on the left side pane. 2. Verify the **Created** date for the token is _on or before_ July 18, 2023. - + 3. Click **+ New Token** on the top right side of the screen. Ensure the new token has the same permissions as the old one. 4. Copy the new token and replace the old one in your systems. Store it in a safe place, as it will not be available again once the creation screen is closed. 5. Delete the old token in dbt Cloud by clicking the **trash can icon**. _Only take this action after the new token is in place to avoid service disruptions_. diff --git a/website/docs/docs/dbt-cloud-apis/user-tokens.md b/website/docs/docs/dbt-cloud-apis/user-tokens.md index 5734f8ba35a..77e536b12a5 100644 --- a/website/docs/docs/dbt-cloud-apis/user-tokens.md +++ b/website/docs/docs/dbt-cloud-apis/user-tokens.md @@ -14,7 +14,7 @@ permissions of the user the that they were created for. You can find your User API token in the Profile page under the `API Access` label. - + ## FAQs diff --git a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md index dc2cdb63748..0b588376c34 100644 --- a/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md +++ b/website/docs/docs/dbt-versions/release-notes/77-Sept-2023/removing-prerelease-versions.md @@ -12,4 +12,4 @@ Previously, when dbt Labs released a new [version](/docs/dbt-versions/core#how-d To see which version you are currently using and to upgrade, select **Deploy** in the top navigation bar and select **Environments**. Choose the preferred environment and click **Settings**. Click **Edit** to make a change to the current dbt version. dbt Labs recommends always using the latest version whenever possible to take advantage of new features and functionality. - \ No newline at end of file + \ No newline at end of file diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md index 38b017baa30..1aabe517076 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-details-and-logs-improvements.md @@ -16,4 +16,4 @@ Highlights include: - Cleaner look and feel with iconography - Helpful tool tips - + diff --git a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md index 0bc4b76d0fc..d4d299b1d36 100644 --- a/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md +++ b/website/docs/docs/dbt-versions/release-notes/81-May-2023/run-history-improvements.md @@ -8,7 +8,7 @@ tags: [May-2023, Scheduler] New usability and design improvements to the **Run History** dashboard in dbt Cloud are now available. These updates allow people to discover the information they need more easily by reducing the number of clicks, surfacing more relevant information, keeping people in flow state, and designing the look and feel to be more intuitive to use. - + Highlights include: diff --git a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md index 9ceda7749cd..bdc89b4abde 100644 --- a/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md +++ b/website/docs/docs/dbt-versions/release-notes/86-Dec-2022/new-jobs-default-as-off.md @@ -10,5 +10,5 @@ To help save compute time, new jobs will no longer be triggered to run by defaul For more information, refer to [Deploy jobs](/docs/deploy/deploy-jobs). - + diff --git a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md index 41e1a5265ca..2d0488d4488 100644 --- a/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md +++ b/website/docs/docs/dbt-versions/release-notes/92-July-2022/render-lineage-feature.md @@ -13,4 +13,4 @@ Large DAGs can take a long time (10 or more seconds, if not minutes) to render a The new button prevents large DAGs from rendering automatically. Instead, you can select **Render Lineage** to load the visualization. This should affect about 15% of the DAGs. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md index 90e6ac72fea..307786c6b85 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/ide-timeout-message.md @@ -10,4 +10,4 @@ We fixed an issue where a spotty internet connection could cause the “IDE sess We updated the health check logic so it now excludes client-side connectivity issues from the IDE session check. If you lose your internet connection, we no longer update the health-check state. Now, losing internet connectivity will no longer cause this unexpected message. - + diff --git a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md index 46c1f4bbd15..9ff5986b4da 100644 --- a/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md +++ b/website/docs/docs/dbt-versions/release-notes/95-March-2022/prep-and-waiting-time.md @@ -9,4 +9,4 @@ tags: [v1.1.46, March-02-2022] dbt Cloud now shows "waiting time" and "prep time" for a run, which used to be expressed in aggregate as "queue time". Waiting time captures the time dbt Cloud waits to run your job if there isn't an available run slot or if a previous run of the same job is still running. Prep time represents the time it takes dbt Cloud to ready your job to run in your cloud data warehouse. - + diff --git a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md index 75697d32d17..052611f66e6 100644 --- a/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md +++ b/website/docs/docs/dbt-versions/upgrade-core-in-cloud.md @@ -9,7 +9,7 @@ In dbt Cloud, both jobs and environments are configured to use a specific versio Navigate to the settings page of an environment, then click **edit**. Click the **dbt Version** dropdown bar and make your selection. From this list, you can select an available version of Core to associate with this environment. - + Be sure to save your changes before navigating away. @@ -17,7 +17,7 @@ Be sure to save your changes before navigating away. Each job in dbt Cloud can be configured to inherit parameters from the environment it belongs to. - + The example job seen in the screenshot above belongs to the environment "Prod". It inherits the dbt version of its environment as shown by the **Inherited from ENVIRONMENT_NAME (DBT_VERSION)** selection. You may also manually override the dbt version of a specific job to be any of the current Core releases supported by Cloud by selecting another option from the dropdown. @@ -275,7 +275,7 @@ Once you have your project compiling and running on the latest version of dbt in - + Then add a job to the new testing environment that replicates one of the production jobs your team relies on. If that job runs smoothly, you should be all set to merge your branch into main and change your development and deployment environments in your main dbt project to run off the newest version of dbt Core. diff --git a/website/docs/docs/deploy/artifacts.md b/website/docs/docs/deploy/artifacts.md index 7ecc05355a0..9b3ae71e79c 100644 --- a/website/docs/docs/deploy/artifacts.md +++ b/website/docs/docs/deploy/artifacts.md @@ -10,11 +10,11 @@ When running dbt jobs, dbt Cloud generates and saves *artifacts*. You can use th While running any job can produce artifacts, you should only associate one production job with a given project to produce the project's artifacts. You can designate this connection in the **Project details** page. To access this page, click the gear icon in the upper right, select **Account Settings**, select your project, and click **Edit** in the lower right. Under **Artifacts**, select the jobs you want to produce documentation and source freshness artifacts for. - + If you don't see your job listed, you might need to edit the job and select **Run source freshness** and **Generate docs on run**. - + When you add a production job to a project, dbt Cloud updates the content and provides links to the production documentation and source freshness artifacts it generated for that project. You can see these links by clicking **Deploy** in the upper left, selecting **Jobs**, and then selecting the production job. From the job page, you can select a specific run to see how artifacts were updated for that run only. @@ -25,10 +25,10 @@ When set up, dbt Cloud updates the **Documentation** link in the header tab so i Note that both the job's commands and the docs generate step (triggered by the **Generate docs on run** checkbox) must succeed during the job invocation for the project-level documentation to be populated or updated. - + ### Source Freshness As with Documentation, configuring a job for the Source Freshness artifact setting also updates the Data Sources link under **Deploy**. The new link points to the latest Source Freshness report for the selected job. - + diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 9f0bafddaef..149a6951fdc 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -117,10 +117,10 @@ If you're experiencing any issues, review some of the common questions and answe First, make sure you have the native GitHub authentication, native GitLab authentication, or native Azure DevOps authentication set up depending on which git provider you use. After you have gone through those steps, go to Account Settings, select Projects and click on the project you'd like to reconnect through native GitHub, GitLab, or Azure DevOps auth. Then click on the repository link.



Once you're in the repository page, select Edit and then Disconnect Repository at the bottom.

- +

Confirm that you'd like to disconnect your repository. You should then see a new Configure a repository link in your old repository's place. Click through to the configuration page:

- +

Select the GitHub, GitLab, or AzureDevOps tab and reselect your repository. That should complete the setup of the project and enable you to set up a dbt Cloud CI job.
diff --git a/website/docs/docs/deploy/dashboard-status-tiles.md b/website/docs/docs/deploy/dashboard-status-tiles.md index 4da0f859546..d9e33fc32d6 100644 --- a/website/docs/docs/deploy/dashboard-status-tiles.md +++ b/website/docs/docs/deploy/dashboard-status-tiles.md @@ -9,11 +9,11 @@ In dbt Cloud, the [Discovery API](/docs/dbt-cloud-apis/discovery-api) can power ## Functionality The dashboard status tile looks like this: - + The data freshness check fails if any sources feeding into the exposure are stale. The data quality check fails if any dbt tests fail. A failure state could look like this: - + Clicking into **see details** from the Dashboard Status Tile takes you to a landing page where you can learn more about the specific sources, models, and tests feeding into this exposure. @@ -56,11 +56,11 @@ Note that Mode has also built its own [integration](https://mode.com/get-dbt/) w Looker does not allow you to directly embed HTML and instead requires creating a [custom visualization](https://docs.looker.com/admin-options/platform/visualizations). One way to do this for admins is to: - Add a [new visualization](https://fishtown.looker.com/admin/visualizations) on the visualization page for Looker admins. You can use [this URL](https://metadata.cloud.getdbt.com/static/looker-viz.js) to configure a Looker visualization powered by the iFrame. It will look like this: - + - Once you have set up your custom visualization, you can use it on any dashboard! You can configure it with the exposure name, jobID, and token relevant to that dashboard. - + ### Tableau Tableau does not require you to embed an iFrame. You only need to use a Web Page object on your Tableau Dashboard and a URL in the following format: @@ -79,7 +79,7 @@ https://metadata.cloud.getdbt.com/exposure-tile?name=&jobId= + ### Sigma @@ -99,4 +99,4 @@ https://metadata.au.dbt.com/exposure-tile?name=&jobId=&to ``` ::: - + diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index 90ba0c7796c..cee6e245359 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -80,7 +80,7 @@ dbt Cloud uses [Coordinated Universal Time](https://en.wikipedia.org/wiki/Coordi To fully customize the scheduling of your job, choose the **Custom cron schedule** option and use the cron syntax. With this syntax, you can specify the minute, hour, day of the month, month, and day of the week, allowing you to set up complex schedules like running a job on the first Monday of each month. - + Use tools such as [crontab.guru](https://crontab.guru/) to generate the correct cron syntax. This tool allows you to input cron snippets and returns their plain English translations. diff --git a/website/docs/docs/deploy/deployment-tools.md b/website/docs/docs/deploy/deployment-tools.md index 64fcb1dadae..cca2368f38a 100644 --- a/website/docs/docs/deploy/deployment-tools.md +++ b/website/docs/docs/deploy/deployment-tools.md @@ -19,8 +19,8 @@ If your organization is using [Airflow](https://airflow.apache.org/), there are Installing the [dbt Cloud Provider](https://airflow.apache.org/docs/apache-airflow-providers-dbt-cloud/stable/index.html) to orchestrate dbt Cloud jobs. This package contains multiple Hooks, Operators, and Sensors to complete various actions within dbt Cloud. - - + + @@ -71,7 +71,7 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi - As jobs are executing, you can poll dbt to see whether or not the job completes without failures, through the [Prefect user interface (UI)](https://docs.prefect.io/ui/overview/). - + diff --git a/website/docs/docs/deploy/source-freshness.md b/website/docs/docs/deploy/source-freshness.md index 3c4866cd084..2f9fe6bc007 100644 --- a/website/docs/docs/deploy/source-freshness.md +++ b/website/docs/docs/deploy/source-freshness.md @@ -6,7 +6,7 @@ description: "Validate that data freshness meets expectations and alert if stale dbt Cloud provides a helpful interface around dbt's [source data freshness](/docs/build/sources#snapshotting-source-data-freshness) calculations. When a dbt Cloud job is configured to snapshot source data freshness, dbt Cloud will render a user interface showing you the state of the most recent snapshot. This interface is intended to help you determine if your source data freshness is meeting the service level agreement (SLA) that you've defined for your organization. - + ### Enabling source freshness snapshots @@ -15,7 +15,7 @@ dbt Cloud provides a helpful interface around dbt's [source data freshness](/doc - Select the **Generate docs on run** checkbox to automatically [generate project docs](/docs/collaborate/build-and-view-your-docs#set-up-a-documentation-job). - Select the **Run source freshness** checkbox to enable [source freshness](#checkbox) as the first step of the job. - + To enable source freshness snapshots, firstly make sure to configure your sources to [snapshot freshness information](/docs/build/sources#snapshotting-source-data-freshness). You can add source freshness to the list of commands in the job run steps or enable the checkbox. However, you can expect different outcomes when you configure a job by selecting the **Run source freshness** checkbox compared to adding the command to the run steps. @@ -27,7 +27,7 @@ Review the following options and outcomes: | **Add as a run step** | Add the `dbt source freshness` command to a job anywhere in your list of run steps. However, if your source data is out of date — this step will "fail", and subsequent steps will not run. dbt Cloud will trigger email notifications (if configured) based on the end state of this step.

You can create a new job to snapshot source freshness.

If you *do not* want your models to run if your source data is out of date, then it could be a good idea to run `dbt source freshness` as the first step in your job. Otherwise, we recommend adding `dbt source freshness` as the last step in the job, or creating a separate job just for this task. | - + ### Source freshness snapshot frequency diff --git a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md index e6a50443837..f41bceab12d 100644 --- a/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md +++ b/website/docs/docs/running-a-dbt-project/using-the-dbt-ide.md @@ -16,11 +16,11 @@ New dbt Cloud accounts will automatically be created with a Development Environm To create a development environment, choose **Deploy** > **Environments** from the top left. Then, click **Create Environment**. - + Enter an environment **Name** that would help you identify it among your other environments (for example, `Nate's Development Environment`). Choose **Development** as the **Environment Type**. You can also select which **dbt Version** to use at this time. For compatibility reasons, we recommend that you select the same dbt version that you plan to use in your deployment environment. Finally, click **Save** to finish creating your development environment. - + ### Setting up developer credentials @@ -28,14 +28,14 @@ The IDE uses *developer credentials* to connect to your database. These develope New dbt Cloud accounts should have developer credentials created automatically as a part of Project creation in the initial application setup. - + New users on existing accounts *might not* have their development credentials already configured. To manage your development credentials: 1. Navigate to your **Credentials** under **Your Profile** settings, which you can access at `https://YOUR_ACCESS_URL/settings/profile#credentials`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/regions-ip-addresses) for your region and plan. 2. Select the relevant project in the list. After entering your developer credentials, you'll be able to access the dbt IDE. - + ### Compiling and running SQL diff --git a/website/docs/faqs/API/rotate-token.md b/website/docs/faqs/API/rotate-token.md index 0b808fa9176..4470de72d5a 100644 --- a/website/docs/faqs/API/rotate-token.md +++ b/website/docs/faqs/API/rotate-token.md @@ -19,7 +19,7 @@ To automatically rotate your API key: 2. Select **API Access** from the lefthand side. 3. In the **API** pane, click `Rotate`. - + diff --git a/website/docs/faqs/Accounts/change-users-license.md b/website/docs/faqs/Accounts/change-users-license.md index ed12ba5dc14..8755b946126 100644 --- a/website/docs/faqs/Accounts/change-users-license.md +++ b/website/docs/faqs/Accounts/change-users-license.md @@ -10,10 +10,10 @@ To change the license type for a user from `developer` to `read-only` or `IT` in 1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. - + 2. In **Account Settings**, select **Users** under **Teams**. 3. Select the user you want to remove, and click **Edit** in the bottom of their profile. 4. For the **License** option, choose **Read-only** or **IT** (from **Developer**), and click **Save**. - + diff --git a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md index ef2ff8e4cd3..d16651a944c 100644 --- a/website/docs/faqs/Accounts/cloud-upgrade-instructions.md +++ b/website/docs/faqs/Accounts/cloud-upgrade-instructions.md @@ -32,7 +32,7 @@ To unlock your account and select a plan, review the following guidance per plan 3. Confirm your plan selection on the pop up message. 4. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Developer plan. 🎉 - + ### Team plan @@ -42,7 +42,7 @@ To unlock your account and select a plan, review the following guidance per plan 4. Enter your payment details and click **Save**. 5. This automatically unlocks your dbt Cloud account, and you can now enjoy the benefits of the Team plan. 🎉 - + ### Enterprise plan @@ -50,7 +50,7 @@ To unlock your account and select a plan, review the following guidance per plan 2. Click **Contact Sales** on the right. This opens a chat window for you to contact the dbt Cloud Support team, who will connect you to our Sales team. 3. Once you submit your request, our Sales team will contact you with more information. - + 4. Alternatively, you can [contact](https://www.getdbt.com/contact/) our Sales team directly to chat about how dbt Cloud can help you and your team. diff --git a/website/docs/faqs/Git/git-migration.md b/website/docs/faqs/Git/git-migration.md index 775ae3679e3..156227d59ae 100644 --- a/website/docs/faqs/Git/git-migration.md +++ b/website/docs/faqs/Git/git-migration.md @@ -16,7 +16,7 @@ To migrate from one git provider to another, refer to the following steps to avo 2. Go back to dbt Cloud and set up your [integration for the new git provider](/docs/cloud/git/connect-github), if needed. 3. Disconnect the old repository in dbt Cloud by going to **Account Settings** and then **Projects**. Click on the **Repository** link, then click **Edit** and **Disconnect**. - + 4. On the same page, connect to the new git provider repository by clicking **Configure Repository** - If you're using the native integration, you may need to OAuth to it. diff --git a/website/docs/faqs/Project/delete-a-project.md b/website/docs/faqs/Project/delete-a-project.md index 21f16cbfaec..5fde3fee9cd 100644 --- a/website/docs/faqs/Project/delete-a-project.md +++ b/website/docs/faqs/Project/delete-a-project.md @@ -9,10 +9,10 @@ To delete a project in dbt Cloud, you must be the account owner or have admin pr 1. From dbt Cloud, click the gear icon at the top right corner and select **Account Settings**. - + 2. In **Account Settings**, select **Projects**. Click the project you want to delete from the **Projects** page. 3. Click the edit icon in the lower right-hand corner of the **Project Details**. A **Delete** option will appear on the left side of the same details view. 4. Select **Delete**. Confirm the action to immediately delete the user without additional password prompts. There will be no account password prompt, and the project is deleted immediately after confirmation. Once a project is deleted, this action cannot be undone. - + diff --git a/website/docs/guides/adapter-creation.md b/website/docs/guides/adapter-creation.md index 12bda4726f9..28e0e8253ad 100644 --- a/website/docs/guides/adapter-creation.md +++ b/website/docs/guides/adapter-creation.md @@ -107,7 +107,7 @@ A set of *materializations* and their corresponding helper macros defined in dbt Below is a diagram of how dbt-postgres, the adapter at the center of dbt-core, works. - + ## Prerequisites @@ -1225,17 +1225,17 @@ This can vary substantially depending on the nature of the release but a good ba Breaking this down: - Visually distinctive announcement - make it clear this is a release - + - Short written description of what is in the release - + - Links to additional resources - + - Implementation instructions: - + - Future plans - + - Contributor recognition (if applicable) - + ## Verify a new adapter diff --git a/website/docs/guides/bigquery-qs.md b/website/docs/guides/bigquery-qs.md index d961a27018a..4f461a3cf3a 100644 --- a/website/docs/guides/bigquery-qs.md +++ b/website/docs/guides/bigquery-qs.md @@ -56,7 +56,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen Click **Run**, then check for results from the queries. For example:
- +
2. Create new datasets from the [BigQuery Console](https://console.cloud.google.com/bigquery). For more information, refer to [Create datasets](https://cloud.google.com/bigquery/docs/datasets#create-dataset) in the Google Cloud docs. Datasets in BigQuery are equivalent to schemas in a traditional database. On the **Create dataset** page: - **Dataset ID** — Enter a name that fits the purpose. This name is used like schema in fully qualified references to your database objects such as `database.schema.table`. As an example for this guide, create one for `jaffle_shop` and another one for `stripe` afterward. @@ -64,7 +64,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **Enable table expiration** — Leave it unselected (the default). The default for the billing table expiration is 60 days. Because billing isn’t enabled for this project, GCP defaults to deprecating tables. - **Google-managed encryption key** — This option is available under **Advanced options**. Allow Google to manage encryption (the default).
- +
3. After you create the `jaffle_shop` dataset, create one for `stripe` with all the same values except for **Dataset ID**. diff --git a/website/docs/guides/codespace-qs.md b/website/docs/guides/codespace-qs.md index c399eb494a9..b28b0ddaacf 100644 --- a/website/docs/guides/codespace-qs.md +++ b/website/docs/guides/codespace-qs.md @@ -35,7 +35,7 @@ dbt Labs provides a [GitHub Codespace](https://docs.github.com/en/codespaces/ove 1. Click **Code** (at the top of the new repository’s page). Under the **Codespaces** tab, choose **Create codespace on main**. Depending on how you've configured your computer's settings, this either opens a new browser tab with the Codespace development environment with VSCode running in it or opens a new VSCode window with the codespace in it. 1. Wait for the codespace to finish building by waiting for the `postCreateCommand` command to complete; this can take several minutes: - + When this command completes, you can start using the codespace development environment. The terminal the command ran in will close and you will get a prompt in a brand new terminal. diff --git a/website/docs/guides/custom-cicd-pipelines.md b/website/docs/guides/custom-cicd-pipelines.md index b21fe13b19b..1778098f752 100644 --- a/website/docs/guides/custom-cicd-pipelines.md +++ b/website/docs/guides/custom-cicd-pipelines.md @@ -144,7 +144,7 @@ In Azure: - Click *OK* and then *Save* to save the variable - Save your new Azure pipeline - + diff --git a/website/docs/guides/databricks-qs.md b/website/docs/guides/databricks-qs.md index 98c215382f6..cb01daec394 100644 --- a/website/docs/guides/databricks-qs.md +++ b/website/docs/guides/databricks-qs.md @@ -41,7 +41,7 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 1. Use your existing account or sign up for a Databricks account at [Try Databricks](https://databricks.com/). Complete the form with your user information.
- +
2. For the purpose of this tutorial, you will be selecting AWS as our cloud provider but if you use Azure or GCP internally, please choose one of them. The setup process will be similar. @@ -49,28 +49,28 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 4. After setting up your password, you will be guided to choose a subscription plan. Select the `Premium` or `Enterprise` plan to access the SQL Compute functionality required for using the SQL warehouse for dbt. We have chosen `Premium` for this tutorial. Click **Continue** after selecting your plan.
- +
5. Click **Get Started** when you come to this below page and then **Confirm** after you validate that you have everything needed.
- +
- +
6. Now it's time to create your first workspace. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects like notebooks, SQL warehouses, clusters, etc into one place. Provide the name of your workspace and choose the appropriate AWS region and click **Start Quickstart**. You might get the checkbox of **I have data in S3 that I want to query with Databricks**. You do not need to check this off for the purpose of this tutorial.
- +
7. By clicking on `Start Quickstart`, you will be redirected to AWS and asked to log in if you haven’t already. After logging in, you should see a page similar to this.
- +
:::tip @@ -79,16 +79,16 @@ If you get a session error and don’t get redirected to this page, you can go b 8. There is no need to change any of the pre-filled out fields in the Parameters. Just add in your Databricks password under **Databricks Account Credentials**. Check off the Acknowledgement and click **Create stack**.
- +
- +
10. Go back to the Databricks tab. You should see that your workspace is ready to use.
- +
11. Now let’s jump into the workspace. Click **Open** and log into the workspace using the same login as you used to log into the account. @@ -101,7 +101,7 @@ If you get a session error and don’t get redirected to this page, you can go b 2. First we need a SQL warehouse. Find the drop down menu and toggle into the SQL space.
- +
3. We will be setting up a SQL warehouse now. Select **SQL Warehouses** from the left hand side console. You will see that a default SQL Warehouse exists. @@ -109,12 +109,12 @@ If you get a session error and don’t get redirected to this page, you can go b 5. Once the SQL Warehouse is up, click **New** and then **File upload** on the dropdown menu.
- +
6. Let's load the Jaffle Shop Customers data first. Drop in the `jaffle_shop_customers.csv` file into the UI.
- +
7. Update the Table Attributes at the top: @@ -128,7 +128,7 @@ If you get a session error and don’t get redirected to this page, you can go b - LAST_NAME = string
- +
8. Click **Create** on the bottom once you’re done. @@ -136,11 +136,11 @@ If you get a session error and don’t get redirected to this page, you can go b 9. Now let’s do the same for `Jaffle Shop Orders` and `Stripe Payments`.
- +
- +
10. Once that's done, make sure you can query the training data. Navigate to the `SQL Editor` through the left hand menu. This will bring you to a query editor. @@ -153,7 +153,7 @@ If you get a session error and don’t get redirected to this page, you can go b ```
- +
12. To ensure any users who might be working on your dbt project has access to your object, run this command. diff --git a/website/docs/guides/dbt-python-snowpark.md b/website/docs/guides/dbt-python-snowpark.md index fce0ad692f6..110445344e9 100644 --- a/website/docs/guides/dbt-python-snowpark.md +++ b/website/docs/guides/dbt-python-snowpark.md @@ -51,19 +51,19 @@ Overall we are going to set up the environments, build scalable pipelines in dbt 1. Log in to your trial Snowflake account. You can [sign up for a Snowflake Trial Account using this form](https://signup.snowflake.com/) if you don’t have one. 2. Ensure that your account is set up using **AWS** in the **US East (N. Virginia)**. We will be copying the data from a public AWS S3 bucket hosted by dbt Labs in the us-east-1 region. By ensuring our Snowflake environment setup matches our bucket region, we avoid any multi-region data copy and retrieval latency issues. - + 3. After creating your account and verifying it from your sign-up email, Snowflake will direct you back to the UI called Snowsight. 4. When Snowsight first opens, your window should look like the following, with you logged in as the ACCOUNTADMIN with demo worksheets open: - + 5. Navigate to **Admin > Billing & Terms**. Click **Enable > Acknowledge & Continue** to enable Anaconda Python Packages to run in Snowflake. - + - + 6. Finally, create a new Worksheet by selecting **+ Worksheet** in the upper right corner. @@ -80,7 +80,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 3. Rename the worksheet to `data setup script` since we will be placing code in this worksheet to ingest the Formula 1 data. Make sure you are still logged in as the **ACCOUNTADMIN** and select the **COMPUTE_WH** warehouse. - + 4. Copy the following code into the main body of the Snowflake worksheet. You can also find this setup script under the `setup` folder in the [Git repository](https://github.com/dbt-labs/python-snowpark-formula1/blob/main/setup/setup_script_s3_to_snowflake.sql). The script is long since it's bring in all of the data we'll need today! @@ -233,7 +233,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Ensure all the commands are selected before running the query — an easy way to do this is to use Ctrl-a to highlight all of the code in the worksheet. Select **run** (blue triangle icon). Notice how the dot next to your **COMPUTE_WH** turns from gray to green as you run the query. The **status** table is the final table of all 8 tables loaded in. - + 6. Let’s unpack that pretty long query we ran into component parts. We ran this query to load in our 8 Formula 1 tables from a public S3 bucket. To do this, we: - Created a new database called `formula1` and a schema called `raw` to place our raw (untransformed) data into. @@ -244,7 +244,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 7. Now let's take a look at some of our cool Formula 1 data we just loaded up! 1. Create a new worksheet by selecting the **+** then **New Worksheet**. - + 2. Navigate to **Database > Formula1 > RAW > Tables**. 3. Query the data using the following code. There are only 76 rows in the circuits table, so we don’t need to worry about limiting the amount of data we query. @@ -256,7 +256,7 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 5. Review the query results, you should see information about Formula 1 circuits, starting with Albert Park in Australia! 6. Finally, ensure you have all 8 tables starting with `CIRCUITS` and ending with `STATUS`. Now we are ready to connect into dbt Cloud! - + ## Configure dbt Cloud @@ -264,19 +264,19 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 2. Navigate out of your worksheet back by selecting **home**. 3. In Snowsight, confirm that you are using the **ACCOUNTADMIN** role. 4. Navigate to the **Admin** **> Partner Connect**. Find **dbt** either by using the search bar or navigating the **Data Integration**. Select the **dbt** tile. - + 5. You should now see a new window that says **Connect to dbt**. Select **Optional Grant** and add the `FORMULA1` database. This will grant access for your new dbt user role to the FORMULA1 database. - + 6. Ensure the `FORMULA1` is present in your optional grant before clicking **Connect**.  This will create a dedicated dbt user, database, warehouse, and role for your dbt Cloud trial. - + 7. When you see the **Your partner account has been created** window, click **Activate**. 8. You should be redirected to a dbt Cloud registration page. Fill out the form. Make sure to save the password somewhere for login in the future. - + 9. Select **Complete Registration**. You should now be redirected to your dbt Cloud account, complete with a connection to your Snowflake account, a deployment and a development environment, and a sample job. @@ -286,43 +286,43 @@ We need to obtain our data source by copying our Formula 1 data into Snowflake t 1. First we are going to change the name of our default schema to where our dbt models will build. By default, the name is `dbt_`. We will change this to `dbt_` to create your own personal development schema. To do this, select **Profile Settings** from the gear icon in the upper right. - + 2. Navigate to the **Credentials** menu and select **Partner Connect Trial**, which will expand the credentials menu. - + 3. Click **Edit** and change the name of your schema from `dbt_` to `dbt_YOUR_NAME` replacing `YOUR_NAME` with your initials and name (`hwatson` is used in the lab screenshots). Be sure to click **Save** for your changes! - + 4. We now have our own personal development schema, amazing! When we run our first dbt models they will build into this schema. 5. Let’s open up dbt Cloud’s Integrated Development Environment (IDE) and familiarize ourselves. Choose **Develop** at the top of the UI. 6. When the IDE is done loading, click **Initialize dbt project**. The initialization process creates a collection of files and folders necessary to run your dbt project. - + 7. After the initialization is finished, you can view the files and folders in the file tree menu. As we move through the workshop we'll be sure to touch on a few key files and folders that we'll work with to build out our project. 8. Next click **Commit and push** to commit the new files and folders from the initialize step. We always want our commit messages to be relevant to the work we're committing, so be sure to provide a message like `initialize project` and select **Commit Changes**. - + - + 9. [Committing](https://www.atlassian.com/git/tutorials/saving-changes/git-commit) your work here will save it to the managed git repository that was created during the Partner Connect signup. This initial commit is the only commit that will be made directly to our `main` branch and from *here on out we'll be doing all of our work on a development branch*. This allows us to keep our development work separate from our production code. 10. There are a couple of key features to point out about the IDE before we get to work. It is a text editor, an SQL and Python runner, and a CLI with Git version control all baked into one package! This allows you to focus on editing your SQL and Python files, previewing the results with the SQL runner (it even runs Jinja!), and building models at the command line without having to move between different applications. The Git workflow in dbt Cloud allows both Git beginners and experts alike to be able to easily version control all of their work with a couple clicks. - + 11. Let's run our first dbt models! Two example models are included in your dbt project in the `models/examples` folder that we can use to illustrate how to run dbt at the command line. Type `dbt run` into the command line and click **Enter** on your keyboard. When the run bar expands you'll be able to see the results of the run, where you should see the run complete successfully. - + 12. The run results allow you to see the code that dbt compiles and sends to Snowflake for execution. To view the logs for this run, select one of the model tabs using the  **>** icon and then **Details**. If you scroll down a bit you'll be able to see the compiled code and how dbt interacts with Snowflake. Given that this run took place in our development environment, the models were created in your development schema. - + 13. Now let's switch over to Snowflake to confirm that the objects were actually created. Click on the three dots **…** above your database objects and then **Refresh**. Expand the **PC_DBT_DB** database and you should see your development schema. Select the schema, then **Tables**  and **Views**. Now you should be able to see `MY_FIRST_DBT_MODEL` as a table and `MY_SECOND_DBT_MODEL` as a view. - + ## Create branch and set up project configs @@ -414,15 +414,15 @@ dbt Labs has developed a [project structure guide](/best-practices/how-we-struct 1. In your file tree, use your cursor and hover over the `models` subdirectory, click the three dots **…** that appear to the right of the folder name, then select **Create Folder**. We're going to add two new folders to the file path, `staging` and `formula1` (in that order) by typing `staging/formula1` into the file path. - - + + - If you click into your `models` directory now, you should see the new `staging` folder nested within `models` and the `formula1` folder nested within `staging`. 2. Create two additional folders the same as the last step. Within the `models` subdirectory, create new directories `marts/core`. 3. We will need to create a few more folders and subfolders using the UI. After you create all the necessary folders, your folder tree should look like this when it's all done: - + Remember you can always reference the entire project in [GitHub](https://github.com/dbt-labs/python-snowpark-formula1/tree/python-formula1) to view the complete folder and file strucutre. @@ -742,21 +742,21 @@ The next step is to set up the staging models for each of the 8 source tables. G After the source and all the staging models are complete for each of the 8 tables, your staging folder should look like this: - + 1. It’s a good time to delete our example folder since these two models are extraneous to our formula1 pipeline and `my_first_model` fails a `not_null` test that we won’t spend time investigating. dbt Cloud will warn us that this folder will be permanently deleted, and we are okay with that so select **Delete**. - + 1. Now that the staging models are built and saved, it's time to create the models in our development schema in Snowflake. To do this we're going to enter into the command line `dbt build` to run all of the models in our project, which includes the 8 new staging models and the existing example models. Your run should complete successfully and you should see green checkmarks next to all of your models in the run results. We built our 8 staging models as views and ran 13 source tests that we configured in the `f1_sources.yml` file with not that much code, pretty cool! - + Let's take a quick look in Snowflake, refresh database objects, open our development schema, and confirm that the new models are there. If you can see them, then we're good to go! - + Before we move onto the next section, be sure to commit your new models to your Git branch. Click **Commit and push** and give your commit a message like `profile, sources, and staging setup` before moving on. @@ -1055,7 +1055,7 @@ By now, we are pretty good at creating new files in the correct directories so w 1. Let’s talk about our lineage so far. It’s looking good 😎. We’ve shown how SQL can be used to make data type, column name changes, and handle hierarchical joins really well; all while building out our automated lineage! - + 1. Time to **Commit and push** our changes and give your commit a message like `intermediate and fact models` before moving on. @@ -1128,7 +1128,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? - The `snowflake-snowpark-python` library has been picked up to execute our Python code. Even though this wasn’t explicitly stated this is picked up by the dbt class object because we need our Snowpark package to run Python! Python models take a bit longer to run than SQL models, however we could always speed this up by using [Snowpark-optimized Warehouses](https://docs.snowflake.com/en/user-guide/warehouses-snowpark-optimized.html) if we wanted to. Our data is sufficiently small, so we won’t worry about creating a separate warehouse for Python versus SQL files today. - + The rest of our **Details** output gives us information about how dbt and Snowpark for Python are working together to define class objects and apply a specific set of methods to run our models. @@ -1142,7 +1142,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? ``` and preview the output: - + Not only did Red Bull have the fastest average pit stops by nearly 40 seconds, they also had the smallest standard deviation, meaning they are both fastest and most consistent teams in pit stops. By using the `.describe()` method we were able to avoid verbose SQL requiring us to create a line of code per column and repetitively use the `PERCENTILE_COUNT()` function. @@ -1187,7 +1187,7 @@ First, we want to find out: which constructor had the fastest pit stops in 2021? in the command bar. 12. Once again previewing the output of our data using the same steps for our `fastest_pit_stops_by_constructor` model. - + We can see that it looks like lap times are getting consistently faster over time. Then in 2010 we see an increase occur! Using outside subject matter context, we know that significant rule changes were introduced to Formula 1 in 2010 and 2011 causing slower lap times. @@ -1314,7 +1314,7 @@ At a high level we’ll be: - The `.apply()` function in the pandas library is used to apply a function to a specified axis of a DataFrame or a Series. In our case the function we used was our lambda function! - The `.apply()` function takes two arguments: the first is the function to be applied, and the second is the axis along which the function should be applied. The axis can be specified as 0 for rows or 1 for columns. We are using the default value of 0 so we aren’t explicitly writing it in the code. This means that the function will be applied to each *row* of the DataFrame or Series. 6. Let’s look at the preview of our clean dataframe after running our `ml_data_prep` model: - + ### Covariate encoding @@ -1565,7 +1565,7 @@ If you haven’t seen code like this before or use joblib files to save machine - Right now our model is only in memory, so we need to use our nifty function `save_file` to save our model file to our Snowflake stage. We save our model as a joblib file so Snowpark can easily call this model object back to create predictions. We really don’t need to know much else as a data practitioner unless we want to. It’s worth noting that joblib files aren’t able to be queried directly by SQL. To do this, we would need to transform the joblib file to an SQL querable format such as JSON or CSV (out of scope for this workshop). - Finally we want to return our dataframe, but create a new column indicating what rows were used for training and those for training. 5. Viewing our output of this model: - + 6. Let’s pop back over to Snowflake and check that our logistic regression model has been stored in our `MODELSTAGE` using the command: @@ -1573,10 +1573,10 @@ If you haven’t seen code like this before or use joblib files to save machine list @modelstage ``` - + 7. To investigate the commands run as part of `train_test_position` script, navigate to Snowflake query history to view it **Activity > Query History**. We can view the portions of query that we wrote such as `create or replace stage MODELSTAGE`, but we also see additional queries that Snowflake uses to interpret python code. - + ### Predicting on new data @@ -1731,7 +1731,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Generic tests 1. To implement generic out-of-the-box tests dbt comes with, we can use YAML files to specify information about our models. To add generic tests to our aggregates model, create a file called `aggregates.yml`, copy the code block below into the file, and save. - + ```yaml version: 2 @@ -1762,7 +1762,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod ### Using macros for testing 1. Under your `macros` folder, create a new file and name it `test_all_values_gte_zero.sql`. Copy the code block below and save the file. For clarity, “gte” is an abbreviation for greater than or equal to. - + ```sql {% macro test_all_values_gte_zero(table, column) %} @@ -1776,7 +1776,7 @@ Since the output of our Python models are tables, we can test SQL and Python mod 3. We use the `{% macro %}` to indicate the start of the macro and `{% endmacro %}` for the end. The text after the beginning of the macro block is the name we are giving the macro to later call it. In this case, our macro is called `test_all_values_gte_zero`. Macros take in *arguments* to pass through, in this case the `table` and the `column`. In the body of the macro, we see an SQL statement that is using the `ref` function to dynamically select the table and then the column. You can always view macros without having to run them by using `dbt run-operation`. You can learn more [here](https://docs.getdbt.com/reference/commands/run-operation). 4. Great, now we want to reference this macro as a test! Let’s create a new test file called `macro_pit_stops_mean_is_positive.sql` in our `tests` folder. - + 5. Copy the following code into the file and save: @@ -1805,7 +1805,7 @@ These tests are defined in `.sql` files, typically in your `tests` directory (as Let’s add a custom test that asserts that the moving average of the lap time over the last 5 years is greater than zero (it’s impossible to have time less than 0!). It is easy to assume if this is not the case the data has been corrupted. 1. Create a file `lap_times_moving_avg_assert_positive_or_null.sql` under the `tests` folder. - + 2. Copy the following code and save the file: @@ -1841,11 +1841,11 @@ Let’s add a custom test that asserts that the moving average of the lap time o dbt test --select fastest_pit_stops_by_constructor lap_times_moving_avg ``` - + 3. All 4 of our tests passed (yay for clean data)! To understand the SQL being run against each of our tables, we can click into the details of the test. 4. Navigating into the **Details** of the `unique_fastest_pit_stops_by_constructor_name`, we can see that each line `constructor_name` should only have one row. - + ## Document your dbt project @@ -1865,17 +1865,17 @@ To start, let’s look back at our `intermediate.md` file. We can see that we pr ``` This will generate the documentation for your project. Click the book button, as shown in the screenshot below to access the docs. - + 2. Go to our project area and view `int_results`. View the description that we created in our doc block. - + 3. View the mini-lineage that looks at the model we are currently selected on (`int_results` in this case). - + 4. In our `dbt_project.yml`, we configured `node_colors` depending on the file directory. Starting in dbt v1.3, we can see how our lineage in our docs looks. By color coding your project, it can help you cluster together similar models or steps and more easily troubleshoot. - + ## Deploy your code @@ -1890,18 +1890,18 @@ Now that we've completed testing and documenting our work, we're ready to deploy 1. Before getting started, let's make sure that we've committed all of our work to our feature branch. If you still have work to commit, you'll be able to select the **Commit and push**, provide a message, and then select **Commit** again. 2. Once all of your work is committed, the git workflow button will now appear as **Merge to main**. Select **Merge to main** and the merge process will automatically run in the background. - + 3. When it's completed, you should see the git button read **Create branch** and the branch you're currently looking at will become **main**. 4. Now that all of our development work has been merged to the main branch, we can build our deployment job. Given that our production environment and production job were created automatically for us through Partner Connect, all we need to do here is update some default configurations to meet our needs. 5. In the menu, select **Deploy** **> Environments** - + 6. You should see two environments listed and you'll want to select the **Deployment** environment then **Settings** to modify it. 7. Before making any changes, let's touch on what is defined within this environment. The Snowflake connection shows the credentials that dbt Cloud is using for this environment and in our case they are the same as what was created for us through Partner Connect. Our deployment job will build in our `PC_DBT_DB` database and use the default Partner Connect role and warehouse to do so. The deployment credentials section also uses the info that was created in our Partner Connect job to create the credential connection. However, it is using the same default schema that we've been using as the schema for our development environment. 8. Let's update the schema to create a new schema specifically for our production environment. Click **Edit** to allow you to modify the existing field values. Navigate to **Deployment Credentials >** **schema.** 9. Update the schema name to **production**. Remember to select **Save** after you've made the change. - + 10. By updating the schema for our production environment to **production**, it ensures that our deployment job for this environment will build our dbt models in the **production** schema within the `PC_DBT_DB` database as defined in the Snowflake Connection section. 11. Now let's switch over to our production job. Click on the deploy tab again and then select **Jobs**. You should see an existing and preconfigured **Partner Connect Trial Job**. Similar to the environment, click on the job, then select **Settings** to modify it. Let's take a look at the job to understand it before making changes. @@ -1912,11 +1912,11 @@ Now that we've completed testing and documenting our work, we're ready to deploy So, what are we changing then? Just the name! Click **Edit** to allow you to make changes. Then update the name of the job to **Production Job** to denote this as our production deployment job. After that's done, click **Save**. 12. Now let's go to run our job. Clicking on the job name in the path at the top of the screen will take you back to the job run history page where you'll be able to click **Run run** to kick off the job. If you encounter any job failures, try running the job again before further troubleshooting. - - + + 13. Let's go over to Snowflake to confirm that everything built as expected in our production schema. Refresh the database objects in your Snowflake account and you should see the production schema now within our default Partner Connect database. If you click into the schema and everything ran successfully, you should be able to see all of the models we developed. - + ### Conclusion diff --git a/website/docs/guides/dremio-lakehouse.md b/website/docs/guides/dremio-lakehouse.md index c8a8c4cf83b..378ec857f6a 100644 --- a/website/docs/guides/dremio-lakehouse.md +++ b/website/docs/guides/dremio-lakehouse.md @@ -143,7 +143,7 @@ dremioSamples: Now that you have a running environment and a completed job, you can view the data in Dremio and expand your code. This is a snapshot of the project structure in an IDE: - + ## About the schema.yml @@ -156,7 +156,7 @@ The models correspond to both weather and trip data respectively and will be joi The sources can be found by navigating to the **Object Storage** section of the Dremio Cloud UI. - + ## About the models @@ -170,11 +170,11 @@ The sources can be found by navigating to the **Object Storage** section of the When you run the dbt job, it will create a **dev** space folder that has all the data assets created. This is what you will see in Dremio Cloud UI. Spaces in Dremio is a way to organize data assets which map to business units or data products. - + Open the **Application folder** and you will see the output of the simple transformation we did using dbt. - + ## Query the data diff --git a/website/docs/guides/manual-install-qs.md b/website/docs/guides/manual-install-qs.md index 082d23bc77e..fcd1e5e9599 100644 --- a/website/docs/guides/manual-install-qs.md +++ b/website/docs/guides/manual-install-qs.md @@ -67,7 +67,7 @@ $ pwd 5. Use a code editor like Atom or VSCode to open the project directory you created in the previous steps, which we named jaffle_shop. The content includes folders and `.sql` and `.yml` files generated by the `init` command.
- +
6. dbt provides the following values in the `dbt_project.yml` file: @@ -126,7 +126,7 @@ $ dbt debug ```
- +
### FAQs @@ -150,7 +150,7 @@ dbt run You should have an output that looks like this:
- +
## Commit your changes @@ -197,7 +197,7 @@ $ git checkout -b add-customers-model 4. From the command line, enter `dbt run`.
- +
When you return to the BigQuery console, you can `select` from this model. diff --git a/website/docs/guides/redshift-qs.md b/website/docs/guides/redshift-qs.md index 5f3395acb82..c81a4d247a5 100644 --- a/website/docs/guides/redshift-qs.md +++ b/website/docs/guides/redshift-qs.md @@ -43,17 +43,17 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen 3. Click **Next** for each page until you reach the **Select acknowledgement** checkbox. Select **I acknowledge that AWS CloudFormation might create IAM resources with custom names** and click **Create Stack**. You should land on the stack page with a CREATE_IN_PROGRESS status. - + 4. When the stack status changes to CREATE_COMPLETE, click the **Outputs** tab on the top to view information that you will use throughout the rest of this guide. Save those credentials for later by keeping this open in a tab. 5. Type `Redshift` in the search bar at the top and click **Amazon Redshift**. - + 6. Confirm that your new Redshift cluster is listed in **Cluster overview**. Select your new cluster. The cluster name should begin with `dbtredshiftcluster-`. Then, click **Query Data**. You can choose the classic query editor or v2. We will be using the v2 version for the purpose of this guide. - + 7. You might be asked to Configure account. For this sandbox environment, we recommend selecting “Configure account”. @@ -63,9 +63,9 @@ You can check out [dbt Fundamentals](https://courses.getdbt.com/courses/fundamen - **User name** — `dbtadmin` - **Password** — Use the autogenerated `RSadminpassword` from the output of the stack and save it for later. - + - + 9. Click **Create connection**. @@ -80,15 +80,15 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat 2. Now we are going to use the S3 bucket that you created with CloudFormation and upload the files. Go to the search bar at the top and type in `S3` and click on S3. There will be sample data in the bucket already, feel free to ignore it or use it for other modeling exploration. The bucket will be prefixed with `dbt-data-lake`. - + 3. Click on the `name of the bucket` S3 bucket. If you have multiple S3 buckets, this will be the bucket that was listed under “Workshopbucket” on the Outputs page. - + 4. Click **Upload**. Drag the three files into the UI and click the **Upload** button. - + 5. Remember the name of the S3 bucket for later. It should look like this: `s3://dbt-data-lake-xxxx`. You will need it for the next section. 6. Now let’s go back to the Redshift query editor. Search for Redshift in the search bar, choose your cluster, and select Query data. @@ -171,7 +171,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Port** — `5439` - **Database** — `dbtworkshop`.
- +
5. Set your development credentials. These credentials will be used by dbt Cloud to connect to Redshift. Those credentials (as provided in your CloudFormation output) will be: @@ -179,7 +179,7 @@ Now we are going to load our sample data into the S3 bucket that our Cloudformat - **Password** — This is the autogenerated password that you used earlier in the guide - **Schema** — dbt Cloud automatically generates a schema name for you. By convention, this is `dbt_`. This is the schema connected directly to your development environment, and it's where your models will be built when running dbt within the Cloud IDE.
- +
6. Click **Test Connection**. This verifies that dbt Cloud can access your Redshift cluster. diff --git a/website/docs/guides/refactoring-legacy-sql.md b/website/docs/guides/refactoring-legacy-sql.md index b12baac95cd..a339e523020 100644 --- a/website/docs/guides/refactoring-legacy-sql.md +++ b/website/docs/guides/refactoring-legacy-sql.md @@ -44,7 +44,7 @@ While refactoring you'll be **moving around** a lot of logic, but ideally you wo To get going, you'll copy your legacy SQL query into your dbt project, by saving it in a `.sql` file under the `/models` directory of your project. - + Once you've copied it over, you'll want to `dbt run` to execute the query and populate the in your warehouse. @@ -76,7 +76,7 @@ If you're migrating multiple stored procedures into dbt, with sources you can se This allows you to consolidate modeling work on those base tables, rather than calling them separately in multiple places. - + #### Build the habit of analytics-as-code Sources are an easy way to get your feet wet using config files to define aspects of your transformation pipeline. diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index aa4811d9339..89d7c5a14fa 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -22,7 +22,7 @@ After that, there's time to get fancy, but let's walk before we run. In this guide, we're going to add a **CI environment**, where proposed changes can be validated in the context of the entire project without impacting production systems. We will use a single set of deployment credentials (like the Prod environment), but models are built in a separate location to avoid impacting others (like the Dev environment). Your git flow will look like this: - + ### Prerequisites @@ -309,7 +309,7 @@ The team at Sunrun maintained a SOX-compliant deployment in dbt while reducing t In this section, we will add a new **QA** environment. New features will branch off from and be merged back into the associated `qa` branch, and a member of your team (the "Release Manager") will create a PR against `main` to be validated in the CI environment before going live. The git flow will look like this: - + ### Advanced prerequisites @@ -323,7 +323,7 @@ As noted above, this branch will outlive any individual feature, and will be the See [Custom branch behavior](/docs/dbt-cloud-environments#custom-branch-behavior). Setting `qa` as your custom branch ensures that the IDE creates new branches and PRs with the correct target, instead of using `main`. - + ### 3. Create a new QA environment diff --git a/website/docs/guides/snowflake-qs.md b/website/docs/guides/snowflake-qs.md index 492609c9bcf..0401c37871f 100644 --- a/website/docs/guides/snowflake-qs.md +++ b/website/docs/guides/snowflake-qs.md @@ -143,35 +143,35 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 1. In the Snowflake UI, click on the home icon in the upper left corner. In the left sidebar, select **Admin**. Then, select **Partner Connect**. Find the dbt tile by scrolling or by searching for dbt in the search bar. Click the tile to connect to dbt. - + If you’re using the classic version of the Snowflake UI, you can click the **Partner Connect** button in the top bar of your account. From there, click on the dbt tile to open up the connect box. - + 2. In the **Connect to dbt** popup, find the **Optional Grant** option and select the **RAW** and **ANALYTICS** databases. This will grant access for your new dbt user role to each database. Then, click **Connect**. - + - + 3. Click **Activate** when a popup appears: - + - + 4. After the new tab loads, you will see a form. If you already created a dbt Cloud account, you will be asked to provide an account name. If you haven't created account, you will be asked to provide an account name and password. - + 5. After you have filled out the form and clicked **Complete Registration**, you will be logged into dbt Cloud automatically. 6. From your **Account Settings** in dbt Cloud (using the gear menu in the upper right corner), choose the "Partner Connect Trial" project and select **snowflake** in the overview table. Select edit and update the fields **Database** and **Warehouse** to be `analytics` and `transforming`, respectively. - + - +
@@ -181,7 +181,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno 2. Enter a project name and click **Continue**. 3. For the warehouse, click **Snowflake** then **Next** to set up your connection. - + 4. Enter your **Settings** for Snowflake with: * **Account** — Find your account by using the Snowflake trial account URL and removing `snowflakecomputing.com`. The order of your account information will vary by Snowflake version. For example, Snowflake's Classic console URL might look like: `oq65696.west-us-2.azure.snowflakecomputing.com`. The AppUI or Snowsight URL might look more like: `snowflakecomputing.com/west-us-2.azure/oq65696`. In both examples, your account will be: `oq65696.west-us-2.azure`. For more information, see [Account Identifiers](https://docs.snowflake.com/en/user-guide/admin-account-identifier.html) in the Snowflake docs. @@ -192,7 +192,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Database** — `analytics`. This tells dbt to create new models in the analytics database. * **Warehouse** — `transforming`. This tells dbt to use the transforming warehouse that was created earlier. - + 5. Enter your **Development Credentials** for Snowflake with: * **Username** — The username you created for Snowflake. The username is not your email address and is usually your first and last name together in one word. @@ -201,7 +201,7 @@ Using Partner Connect allows you to create a complete dbt account with your [Sno * **Target name** — Leave as the default. * **Threads** — Leave as 4. This is the number of simultaneous connects that dbt Cloud will make to build models concurrently. - + 6. Click **Test Connection**. This verifies that dbt Cloud can access your Snowflake account. 7. If the connection test succeeds, click **Next**. If it fails, you may need to check your Snowflake settings and credentials. diff --git a/website/docs/guides/starburst-galaxy-qs.md b/website/docs/guides/starburst-galaxy-qs.md index 9a6c44574cd..1822c83fa90 100644 --- a/website/docs/guides/starburst-galaxy-qs.md +++ b/website/docs/guides/starburst-galaxy-qs.md @@ -92,11 +92,11 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To The **Amazon S3** page should look similar to this, except for the **Authentication to S3** section which is dependant on your setup: - + 8. Click **Test connection**. This verifies that Starburst Galaxy can access your S3 bucket. 9. Click **Connect catalog** if the connection test passes. - + 10. On the **Set permissions** page, click **Skip**. You can add permissions later if you want. 11. On the **Add to cluster** page, choose the cluster you want to add the catalog to from the dropdown and click **Add to cluster**. @@ -113,7 +113,7 @@ In addition to Amazon S3, Starburst Galaxy supports many other data sources. To When done, click **Add privileges**. - + ## Create tables with Starburst Galaxy To query the Jaffle Shop data with Starburst Galaxy, you need to create tables using the Jaffle Shop data that you [loaded to your S3 bucket](#load-data-to-s3). You can do this (and run any SQL statement) from the [query editor](https://docs.starburst.io/starburst-galaxy/query/query-editor.html). @@ -121,7 +121,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u 1. Click **Query > Query editor** on the left sidebar of the Starburst Galaxy UI. The main body of the page is now the query editor. 2. Configure the query editor so it queries your S3 bucket. In the upper right corner of the query editor, select your cluster in the first gray box and select your catalog in the second gray box: - + 3. Copy and paste these queries into the query editor. Then **Run** each query individually. @@ -181,7 +181,7 @@ To query the Jaffle Shop data with Starburst Galaxy, you need to create tables u ``` 4. When the queries are done, you can see the following hierarchy on the query editor's left sidebar: - + 5. Verify that the tables were created successfully. In the query editor, run the following queries: diff --git a/website/docs/reference/node-selection/graph-operators.md b/website/docs/reference/node-selection/graph-operators.md index 88d99d7b92a..8cba43e1b52 100644 --- a/website/docs/reference/node-selection/graph-operators.md +++ b/website/docs/reference/node-selection/graph-operators.md @@ -29,7 +29,7 @@ dbt run --select "3+my_model+4" # select my_model, its parents up to the ### The "at" operator The `@` operator is similar to `+`, but will also include _the parents of the children of the selected model_. This is useful in continuous integration environments where you want to build a model and all of its children, but the _parents_ of those children might not exist in the database yet. The selector `@snowplow_web_page_context` will build all three models shown in the diagram below. - + ```bash dbt run --models @my_model # select my_model, its children, and the parents of its children diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index a5198fd3487..8f323bc4236 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -379,7 +379,7 @@ models: - + ### Specifying tags BigQuery table and view *tags* can be created by supplying an empty string for the label value. diff --git a/website/docs/reference/resource-configs/persist_docs.md b/website/docs/reference/resource-configs/persist_docs.md index 481f25d4e95..15b1e0bdb40 100644 --- a/website/docs/reference/resource-configs/persist_docs.md +++ b/website/docs/reference/resource-configs/persist_docs.md @@ -186,8 +186,8 @@ models: Run dbt and observe that the created relation and columns are annotated with your descriptions: - - diff --git a/website/docs/reference/resource-configs/spark-configs.md b/website/docs/reference/resource-configs/spark-configs.md index 5c32fa5fc83..ce3b317f0f1 100644 --- a/website/docs/reference/resource-configs/spark-configs.md +++ b/website/docs/reference/resource-configs/spark-configs.md @@ -104,7 +104,7 @@ If no `partition_by` is specified, then the `insert_overwrite` strategy will ato - This strategy is not available when connecting via Databricks SQL endpoints (`method: odbc` + `endpoint`). - If connecting via a Databricks cluster + ODBC driver (`method: odbc` + `cluster`), you **must** include `set spark.sql.sources.partitionOverwriteMode DYNAMIC` in the [cluster Spark Config](https://docs.databricks.com/clusters/configure.html#spark-config) in order for dynamic partition replacement to work (`incremental_strategy: insert_overwrite` + `partition_by`). - + + If mixing images and text together, also consider using a docs block. diff --git a/website/docs/terms/dag.md b/website/docs/terms/dag.md index b108c68806a..c6b91300bfc 100644 --- a/website/docs/terms/dag.md +++ b/website/docs/terms/dag.md @@ -32,7 +32,7 @@ One of the great things about DAGs is that they are *visual*. You can clearly id Take this mini-DAG for an example: - + What can you learn from this DAG? Immediately, you may notice a handful of things: @@ -57,7 +57,7 @@ You can additionally use your DAG to help identify bottlenecks, long-running dat ...to name just a few. Understanding the factors impacting model performance can help you decide on [refactoring approaches](https://courses.getdbt.com/courses/refactoring-sql-for-modularity), [changing model materialization](https://docs.getdbt.com/blog/how-we-shaved-90-minutes-off-model#attempt-2-moving-to-an-incremental-model)s, replacing multiple joins with surrogate keys, or other methods. - + ### Modular data modeling best practices @@ -83,7 +83,7 @@ The marketing team at dbt Labs would be upset with us if we told you we think db Whether you’re using dbt Core or Cloud, dbt docs and the Lineage Graph are available to all dbt developers. The Lineage Graph in dbt Docs can show a model or source’s entire lineage, all within a visual frame. Clicking within a model, you can view the Lineage Graph and adjust selectors to only show certain models within the DAG. Analyzing the DAG here is a great way to diagnose potential inefficiencies or lack of modularity in your dbt project. - + The DAG is also [available in the dbt Cloud IDE](https://www.getdbt.com/blog/on-dags-hierarchies-and-ides/), so you and your team can refer to your lineage while you build your models. diff --git a/website/docs/terms/data-lineage.md b/website/docs/terms/data-lineage.md index 163047187ba..d0162c35616 100644 --- a/website/docs/terms/data-lineage.md +++ b/website/docs/terms/data-lineage.md @@ -69,7 +69,7 @@ Your is used to visually show upstream dependencies, the nodes Ultimately, DAGs are an effective way to see relationships between data sources, models, and dashboards. DAGs are also a great way to see visual bottlenecks, or inefficiencies in your data work (see image below for a DAG with...many bottlenecks). Data teams can additionally add [meta fields](https://docs.getdbt.com/reference/resource-configs/meta) and documentation to nodes in the DAG to add an additional layer of governance to their dbt project. - + :::tip Automatic > Manual diff --git a/website/snippets/quickstarts/intro-build-models-atop-other-models.md b/website/snippets/quickstarts/intro-build-models-atop-other-models.md index eeedec34892..1104461079b 100644 --- a/website/snippets/quickstarts/intro-build-models-atop-other-models.md +++ b/website/snippets/quickstarts/intro-build-models-atop-other-models.md @@ -2,4 +2,4 @@ As a best practice in SQL, you should separate logic that cleans up your data fr Now you can experiment by separating the logic out into separate models and using the [ref](/reference/dbt-jinja-functions/ref) function to build models on top of other models: - + diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index 3027a88f45a..36d59ad42a3 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -10,7 +10,7 @@ margin: 10px auto; padding-right: 10px; display: block; - max-width: 400px; + max-width: 100%; } :local(.collapsed) { From e3f4bcd5e2ef9fd7b4d606ff341b09a139e6d319 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 16 Jan 2024 12:09:22 +0000 Subject: [PATCH 34/56] adjust scale --- website/src/components/lightbox/index.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index a3d211ea237..1a6aa4d319d 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -59,7 +59,7 @@ function Lightbox({ alt={alt ? alt : title ? title : ''} title={title ? title : ''} src={imageCacheWrapper(src)} - style={expandImage ? { transform: 'scale(1.3)', transition: 'transform 0.3s ease', zIndex: '9999' } : {}} + style={expandImage ? { transform: 'scale(1.2)', transition: 'transform 0.3s ease', zIndex: '9999' } : {}} /> From 78c3d9433162652c8386c9d0cad3df9a1c8e60ce Mon Sep 17 00:00:00 2001 From: Ly Nguyen Date: Tue, 16 Jan 2024 20:04:36 -0800 Subject: [PATCH 35/56] Feedback 2 --- website/docs/docs/deploy/ci-jobs.md | 9 ++++----- website/docs/docs/deploy/continuous-integration.md | 4 ++++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index c46a79f4f86..d449f043470 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -13,11 +13,10 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy ### Prerequisites -- You have a dbt Cloud account. -- Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. -- For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features: - - Your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). - - You must be using one of these native Git integrations with dbt Cloud: [GitHub](/docs/cloud/git/connect-github), [GitLab](/docs/cloud/git/connect-gitlab), or [Azure DevOps](/docs/cloud/git/connect-azure-devops). If you have a GitLab integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. +- You have a dbt Cloud account. +- For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). +- Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. + - If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index 90a56d47aea..f119c30b736 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -32,6 +32,7 @@ The [dbt Cloud scheduler](/docs/deploy/job-scheduler) executes CI jobs different - **Concurrent CI checks** — CI runs triggered by the same dbt Cloud CI job execute concurrently (in parallel), when appropriate - **Smart cancellation of stale builds** — Automatically cancels stale, in-flight CI runs when there are new commits to the PR +- **Run slot treatment** — CI runs don't consume a run slot ### Concurrent CI checks @@ -49,3 +50,6 @@ When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the late +### Run slot treatment + +CI runs for accounts on the [Enterprise, Team, and Developer plans](https://www.getdbt.com/pricing) don't consume run slots so a CI check will never block a production run. \ No newline at end of file From 2d851ed2b90540935052f30971974121fe6bf708 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 17 Jan 2024 13:56:03 +0000 Subject: [PATCH 36/56] revert --- website/src/components/lightbox/styles.module.css | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index 36d59ad42a3..3027a88f45a 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -10,7 +10,7 @@ margin: 10px auto; padding-right: 10px; display: block; - max-width: 100%; + max-width: 400px; } :local(.collapsed) { From bab688e5ec2cdf7741a7bec971c75cd18a8bda99 Mon Sep 17 00:00:00 2001 From: Ly Nguyen Date: Wed, 17 Jan 2024 08:39:59 -0800 Subject: [PATCH 37/56] Differentiate btn free and developer plans --- website/docs/docs/deploy/continuous-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index f119c30b736..ffe8d0c326b 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -52,4 +52,4 @@ When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the late ### Run slot treatment -CI runs for accounts on the [Enterprise, Team, and Developer plans](https://www.getdbt.com/pricing) don't consume run slots so a CI check will never block a production run. \ No newline at end of file +CI runs for accounts on the [Enterprise, Team](https://www.getdbt.com/pricing), and [Free (trial)](https://www.getdbt.com/signup) plans don't consume run slots so a CI check will never block a production run. \ No newline at end of file From 75186c7a4436a2dd4cdbf7abc56ff5cad6bb9898 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Wed, 17 Jan 2024 17:24:32 +0000 Subject: [PATCH 38/56] adjust css --- website/src/components/lightbox/styles.module.css | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index 3027a88f45a..bbc8ecd3872 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -10,7 +10,7 @@ margin: 10px auto; padding-right: 10px; display: block; - max-width: 400px; + max-width: 80%; } :local(.collapsed) { From 9168023269061ff63647f9545e6fa533a73a3c29 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Fri, 19 Jan 2024 11:58:39 +0000 Subject: [PATCH 39/56] Update website/src/components/lightbox/styles.module.css --- website/src/components/lightbox/styles.module.css | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/src/components/lightbox/styles.module.css b/website/src/components/lightbox/styles.module.css index bbc8ecd3872..1f50a2f0427 100644 --- a/website/src/components/lightbox/styles.module.css +++ b/website/src/components/lightbox/styles.module.css @@ -25,7 +25,7 @@ margin: 10px 0 10px auto; } -:local(.hovered) { /* Add the . before the class name */ +:local(.hovered) { filter: drop-shadow(4px 4px 6px #aaaaaae1); transition: transform 0.3s ease; z-index: 9999; From 9fc9f97dfbb3271dca9acf7932845e87add6ceef Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Fri, 19 Jan 2024 12:47:18 +0000 Subject: [PATCH 40/56] adds scrolling varaible --- .../cloud/dbt-cloud-ide/ide-user-interface.md | 42 ++++++++--------- .../docs/cloud/dbt-cloud-ide/lint-format.md | 10 ++-- .../cloud-build-and-view-your-docs.md | 8 ++-- .../collaborate/explore-multiple-projects.md | 4 +- .../docs/collaborate/model-performance.md | 4 +- website/src/components/lightbox/index.js | 47 +++++++++++-------- 6 files changed, 62 insertions(+), 53 deletions(-) diff --git a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md index 2038d4ad64c..8a549e40736 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md @@ -10,7 +10,7 @@ The [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) is a tool fo This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the IDE landscape with ease. - + ## Basic layout @@ -36,7 +36,7 @@ The IDE streamlines your workflow, and features a popular user interface layout * Added (A) — The IDE detects added files * Deleted (D) — The IDE detects deleted files. - + 5. **Command bar —** The Command bar, located in the lower left of the IDE, is used to invoke [dbt commands](/reference/dbt-commands). When a command is invoked, the associated logs are shown in the Invocation History Drawer. @@ -49,7 +49,7 @@ The IDE streamlines your workflow, and features a popular user interface layout The IDE features some delightful tools and layouts to make it easier for you to write dbt code and collaborate with teammates. - + 1. **File Editor —** The File Editor is where users edit code. Tabs break out the region for each opened file, and unsaved files are marked with a blue dot icon in the tab view. @@ -66,24 +66,24 @@ The IDE features some delightful tools and layouts to make it easier for you to ## Additional editing features - **Minimap —** A Minimap (code outline) gives you a high-level overview of your source code, which is useful for quick navigation and code understanding. A file's minimap is displayed on the upper-right side of the editor. To quickly jump to different sections of your file, click the shaded area. - + - **dbt Editor Command Palette —** The dbt Editor Command Palette displays text editing actions and their associated keyboard shortcuts. This can be accessed by pressing `F1` or right-clicking in the text editing area and selecting Command Palette. - + - **Git Diff View —** Clicking on a file in the **Changes** section of the **Version Control Menu** will open the changed file with Git Diff view. The editor will show the previous version on the left and the in-line changes made on the right. - + - **Markdown Preview console tab —** The Markdown Preview console tab shows a preview of your .md file's markdown code in your repository and updates it automatically as you edit your code. - + - **CSV Preview console tab —** The CSV Preview console tab displays the data from your CSV file in a table, which updates automatically as you edit the file in your seed directory. - + ## Console section The console section, located below the File editor, includes various console tabs and buttons to help you with tasks such as previewing, compiling, building, and viewing the . Refer to the following sub-bullets for more details on the console tabs and buttons. - + 1. **Preview button —** When you click on the Preview button, it runs the SQL in the active file editor regardless of whether you have saved it or not and sends the results to the **Results** console tab. You can preview a selected portion of saved or unsaved code by highlighting it and then clicking the **Preview** button. @@ -107,17 +107,17 @@ Starting from dbt v1.6 or higher, when you save changes to a model, you can comp 3. **Format button —** The editor has a **Format** button that can reformat the contents of your files. For SQL files, it uses either `sqlfmt` or `sqlfluff`, and for Python files, it uses `black`. 5. **Results tab —** The Results console tab displays the most recent Preview results in tabular format. - + 6. **Compiled Code tab —** The Compile button triggers a compile invocation that generates compiled code, which is displayed in the Compiled Code tab. - + 7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). - Double-click a node in the DAG to open that file in a new tab - Expand or shrink the DAG using node selection syntax. - Note, the `--exclude` flag isn't supported. - + ## Invocation history @@ -128,7 +128,7 @@ You can open the drawer in multiple ways: - Typing a dbt command and pressing enter - Or pressing Control-backtick (or Ctrl + `) - + 1. **Invocation History list —** The left-hand panel of the Invocation History Drawer displays a list of previous invocations in the IDE, including the command, branch name, command status, and elapsed time. @@ -138,7 +138,7 @@ You can open the drawer in multiple ways: 4. **Command Control button —** Use the Command Control button, located on the right side, to control your invocation and cancel or rerun a selected run. - + 5. **Node Summary tab —** Clicking on the Results Status Tabs will filter the Node Status List based on their corresponding status. The available statuses are Pass (successful invocation of a node), Warn (test executed with a warning), Error (database error or test failure), Skip (nodes not run due to upstream error), and Queued (nodes that have not executed yet). @@ -150,25 +150,25 @@ You can open the drawer in multiple ways: ## Modals and Menus Use menus and modals to interact with IDE and access useful options to help your development workflow. -- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. +- **Editor tab menu —** To interact with open editor tabs, right-click any tab to access the helpful options in the file tab menu. - **File Search —** You can easily search for and navigate between files using the File Navigation menu, which can be accessed by pressing Command-O or Control-O or clicking on the 🔍 icon in the File Explorer. - + - **Global Command Palette—** The Global Command Palette provides helpful shortcuts to interact with the IDE, such as git actions, specialized dbt commands, and compile, and preview actions, among others. To open the menu, use Command-P or Control-P. - + - **IDE Status modal —** The IDE Status modal shows the current error message and debug logs for the server. This also contains an option to restart the IDE. Open this by clicking on the IDE Status button. - + - **Commit Changes modal —** The Commit Changes modal is accessible via the Git Actions button to commit all changes or via the Version Control Options menu to commit individual changes. Once you enter a commit message, you can use the modal to commit and sync the selected changes. - + - **Change Branch modal —** The Change Branch modal allows users to switch git branches in the IDE. It can be accessed through the `Change Branch` link or the Git Actions button in the Version Control menu. - + - **Revert Uncommitted Changes modal —** The Revert Uncommitted Changes modal is how users revert changes in the IDE. This is accessible via the `Revert File` option above the Version Control Options menu, or via the Git Actions button when there are saved, uncommitted changes in the IDE. - + - **IDE Options menu —** The IDE Options menu can be accessed by clicking on the three-dot menu located at the bottom right corner of the IDE. This menu contains global options such as: diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index f6f2265a922..37d8c8d814e 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -63,7 +63,7 @@ Linting doesn't support ephemeral models in dbt v1.5 and lower. Refer to the [FA - **Fix** button — Automatically fixes linting errors in the **File editor**. When fixing is complete, you'll see a message confirming the outcome. - Use the **Code Quality** tab to view and debug any code errors. - + ### Customize linting @@ -130,7 +130,7 @@ group_by_and_order_by_style = implicit For more info on styling best practices, refer to [How we style our SQL](/best-practices/how-we-style/2-how-we-style-our-sql). ::: - + ## Format @@ -158,7 +158,7 @@ To enable sqlfmt: 6. Once you've selected the **sqlfmt** radio button, go to the console section (located below the **File editor**) to select the **Format** button. 7. The **Format** button auto-formats your code in the **File editor**. Once you've auto-formatted, you'll see a message confirming the outcome. - + ### Format YAML, Markdown, JSON @@ -169,7 +169,7 @@ To format your YAML, Markdown, or JSON code, dbt Cloud integrates with [Prettier 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. Use the **Code Quality** tab to view code errors. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + You can add a configuration file to customize formatting rules for YAML, Markdown, or JSON files using Prettier. The IDE looks for the configuration file based on an order of precedence. For example, it first checks for a "prettier" key in your `package.json` file. @@ -185,7 +185,7 @@ To format your Python code, dbt Cloud integrates with [Black](https://black.read 3. In the console section (located below the **File editor**), select the **Format** button to auto-format your code in the **File editor**. 4. Once you've auto-formatted, you'll see a message confirming the outcome. - + ## FAQs diff --git a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md index e104ea8640c..0129b43f305 100644 --- a/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/cloud-build-and-view-your-docs.md @@ -16,7 +16,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under "Execution Settings," select **Generate docs on run**. - + 4. Click **Save**. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs. @@ -44,7 +44,7 @@ You configure project documentation to generate documentation when the job you s 3. Navigate to **Projects** and select the project that needs documentation. 4. Click **Edit**. 5. Under **Artifacts**, select the job that should generate docs when it runs. - + 6. Click **Save**. ## Generating documentation @@ -52,7 +52,7 @@ You configure project documentation to generate documentation when the job you s To generate documentation in the dbt Cloud IDE, run the `dbt docs generate` command in the Command Bar in the dbt Cloud IDE. This command will generate the Docs for your dbt project as it exists in development in your IDE session. - + After generating your documentation, you can click the **Book** icon above the file tree, to see the latest version of your documentation rendered in a new browser window. @@ -65,4 +65,4 @@ These generated docs always show the last fully successful run, which means that The dbt Cloud IDE makes it possible to view [documentation](/docs/collaborate/documentation) for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. - + diff --git a/website/docs/docs/collaborate/explore-multiple-projects.md b/website/docs/docs/collaborate/explore-multiple-projects.md index 3be35110a37..2ec7f573957 100644 --- a/website/docs/docs/collaborate/explore-multiple-projects.md +++ b/website/docs/docs/collaborate/explore-multiple-projects.md @@ -11,7 +11,7 @@ The resource-level lineage graph for a given project displays the cross-project When you view an upstream (parent) project, its public models display a counter icon in the upper right corner indicating how many downstream (child) projects depend on them. Selecting a model reveals the lineage indicating the projects dependent on that model. These counts include all projects listing the upstream one as a dependency in its `dependencies.yml`, even without a direct `{{ ref() }}`. Selecting a project node from a public model opens its detailed lineage graph, which is subject to your [permission](/docs/cloud/manage-access/enterprise-permissions). - + When viewing a downstream (child) project that imports and refs public models from upstream (parent) projects, public models will show up in the lineage graph and display an icon on the graph edge that indicates what the relationship is to a model from another project. Hovering over this icon indicates the specific dbt Cloud project that produces that model. Double-clicking on a model from another project opens the resource-level lineage graph of the parent project, which is subject to your permissions. @@ -43,4 +43,4 @@ When you select a project node in the graph, a project details panel opens on th - Click **Open Project Lineage** to switch to the project’s lineage graph. - Click the Share icon to copy the project panel link to your clipboard so you can share the graph with someone. - \ No newline at end of file + diff --git a/website/docs/docs/collaborate/model-performance.md b/website/docs/docs/collaborate/model-performance.md index 7ef675b4e1e..5b3b4228210 100644 --- a/website/docs/docs/collaborate/model-performance.md +++ b/website/docs/docs/collaborate/model-performance.md @@ -27,7 +27,7 @@ Each data point links to individual models in Explorer. You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback. - + ## The Model performance tab @@ -38,4 +38,4 @@ You can view trends in execution times, counts, and failures by using the Model Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run. - \ No newline at end of file + diff --git a/website/src/components/lightbox/index.js b/website/src/components/lightbox/index.js index 1a6aa4d319d..a846c51b150 100644 --- a/website/src/components/lightbox/index.js +++ b/website/src/components/lightbox/index.js @@ -2,34 +2,27 @@ import React, { useState, useEffect } from 'react'; import styles from './styles.module.css'; import imageCacheWrapper from '../../../functions/image-cache-wrapper'; -function Lightbox({ - src, - collapsed = false, - alignment = "center", - alt = undefined, - title = undefined, - width = undefined, -}) { +function Lightbox({ src, collapsed = false, alignment = "center", alt = undefined, title = undefined, width = undefined }) { const [isHovered, setIsHovered] = useState(false); const [expandImage, setExpandImage] = useState(false); + const [isScrolling, setIsScrolling] = useState(false); useEffect(() => { let timeoutId; - - if (isHovered) { - // Delay the expansion by 5 milliseconds + if (isHovered && !isScrolling) { timeoutId = setTimeout(() => { setExpandImage(true); - }, 5); + }, 300); } - - return () => { - clearTimeout(timeoutId); - }; - }, [isHovered]); + return () => clearTimeout(timeoutId); + }, [isHovered, isScrolling]); const handleMouseEnter = () => { - setIsHovered(true); + setTimeout(() => { + if (!isScrolling) { + setIsHovered(true); + } + }, 300); }; const handleMouseLeave = () => { @@ -37,6 +30,22 @@ function Lightbox({ setExpandImage(false); }; + const handleScroll = () => { + setIsScrolling(true); + setExpandImage(false); + + setTimeout(() => { + setIsScrolling(false); + }, 300); // Delay to reset scrolling state + }; + + useEffect(() => { + window.addEventListener('scroll', handleScroll); + return () => { + window.removeEventListener('scroll', handleScroll); + }; + }, []); + return ( <> @@ -59,7 +68,7 @@ function Lightbox({ alt={alt ? alt : title ? title : ''} title={title ? title : ''} src={imageCacheWrapper(src)} - style={expandImage ? { transform: 'scale(1.2)', transition: 'transform 0.3s ease', zIndex: '9999' } : {}} + style={expandImage ? { transform: 'scale(1.2)', transition: 'transform 0.5s ease', zIndex: '9999' } : {}} /> From 76b60c38ef7d051a39b6cd8f34688089ba361772 Mon Sep 17 00:00:00 2001 From: Jordan Stein Date: Fri, 19 Jan 2024 13:00:59 -0800 Subject: [PATCH 41/56] fix typo and re-format sql --- website/docs/docs/build/conversion-metrics.md | 160 ++++++++---------- 1 file changed, 74 insertions(+), 86 deletions(-) diff --git a/website/docs/docs/build/conversion-metrics.md b/website/docs/docs/build/conversion-metrics.md index 2238655fbe0..8d5f7eb24bf 100644 --- a/website/docs/docs/build/conversion-metrics.md +++ b/website/docs/docs/build/conversion-metrics.md @@ -105,19 +105,18 @@ This step joins the `BUYS` table to the `VISITS` table and gets all combinations The SQL generated in these steps looks like the following: ```sql -select - v.ds, - v.user_id, - v.referrer_id, - b.ds, - b.uuid, - 1 as buys -from visits v -inner join ( - select *, uuid_string() as uuid from buys -- Adds a uuid column to uniquely identify the different rows -) b -on -v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' +SELECT v.ds, + v.user_id, + v.referrer_id, + b.ds, + b.uuid, + 1 as buys +FROM visits v + INNER JOIN (SLECT *, uuid_string() as uuid + FROM buys -- Adds a uuid column to uniquely identify the different rows + ) b + ON + v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' ``` The dataset returns the following (note that there are two potential conversion events for the first visit): @@ -134,19 +133,17 @@ The dataset returns the following (note that there are two potential conversion Instead of returning the raw visit values, use window functions to link conversions to the closest base event. You can partition by the conversion source and get the `first_value` ordered by `visit ds`, descending to get the closest base event from the conversion event: ```sql -select - first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, - first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, - first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, - b.ds, - b.uuid, - 1 as buys -from visits v -inner join ( - select *, uuid_string() as uuid from buys -) b -on -v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' +SELECT first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +FROM visits v + INNER JOIN (select *, uuid_string() as uuid + from buys) b + ON + v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' ``` @@ -168,19 +165,18 @@ To resolve this and eliminate duplicates, use a distinct select. The UUID also h Instead of regular select used in the [Step 2](#step-2-refine-with-window-function), use a distinct select to remove the duplicates: ```sql -select distinct - first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, - first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, - first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, - b.ds, - b.uuid, - 1 as buys -from visits v -inner join ( - select *, uuid_string() as uuid from buys -) b -on -v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; +SELECT DISTINCT first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) + over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +FROM visits v + INNER JOIN (select *, uuid_string() as uuid + from buys) b + ON + v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; ``` The dataset returns the following: @@ -201,38 +197,28 @@ You now have a dataset where every conversion is connected to a visit event. To Now that you’ve tied each conversion event to a visit, you can calculate the aggregated conversions and opportunities measures. Then, you can join them to calculate the actual conversion rate. The SQL to calculate the conversion rate is as follows: ```sql -select - coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, - cast(max(subq_13.buys) as double) / cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d -from ( -- base measure - select - metric_time__day, - sum(visits) as mqls - from ( - select - date_trunc('day', first_contact_date) as metric_time__day, - 1 as visits - from visits - ) subq_2 - group by - metric_time__day -) subq_3 -full outer join ( -- conversion measure - select - metric_time__day, - sum(buys) as sellers - from ( - -- ... - -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. - -- To see the full SQL output, add --explain to your conversion metric query. - ) subq_10 - group by - metric_time__day -) subq_13 -on - subq_3.metric_time__day = subq_13.metric_time__day -group by - metric_time__day +SELECT coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, + cast(max(subq_13.buys) as double) / + cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d +FROM ( -- base measure + SELECT metric_time__day, + sum(visits) as mqls + FROM (SELECT date_trunc('day', first_contact_date) as metric_time__day, + 1 as visits + FROM visits) subq_2 + GROUP BY metric_time__day) subq_3 + FULL OUTER JOIN ( -- conversion measure + SELECT metric_time__day, + sum(buys) as sellers + FROM ( + -- ... + -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. + -- To see the full SQL output, add --explain to your conversion metric query. + ) subq_10 + GROUP BY metric_time__day) subq_13 + ON + subq_3.metric_time__day = subq_13.metric_time__day +GROUP BY metric_time__day ``` ### Additional settings @@ -249,7 +235,7 @@ Use the following additional settings to customize your conversion metrics: To return zero in the final data set, you can set the value of a null conversion event to zero instead of null. You can add the `fill_nulls_with` parameter to your conversion metric definition like this: ```yaml -- name: vist_to_buy_conversion_rate_7_day_window +- name: visit_to_buy_conversion_rate_7_day_window description: "Conversion rate from viewing a page to making a purchase" type: conversion label: Visit to Seller Conversion Rate (7 day window) @@ -329,22 +315,24 @@ In this case, you want to set `product_id` as the constant property. You can spe You will add an additional condition to the join to make sure the constant property is the same across conversions. ```sql -select distinct - first_value(v.ds) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, - first_value(v.user_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, - first_value(v.referrer_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, - buy_source.uuid, - 1 as buys -from {{ source_schema }}.fct_view_item_details v -inner join +SELECT DISTINCT first_value(v.ds) + over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, + first_value(v.user_id) + over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, + first_value(v.referrer_id) + over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, + buy_source.uuid, + 1 as buys +FROM {{ source_schema }}.fct_view_item_details v +INNER JOIN ( - select *, {{ generate_random_uuid() }} as uuid from {{ source_schema }}.fct_purchases + SELECT *, {{ generate_random_uuid() }} as uuid FROM {{ source_schema }}.fct_purchases ) buy_source -on - v.user_id = buy_source.user_id - and v.ds <= buy_source.ds - and v.ds > buy_source.ds - interval '7 day' - and buy_source.product_id = v.product_id --Joining on the constant property product_id +ON + v.user_id = buy_source.user_id + AND v.ds <= buy_source.ds + AND v.ds > buy_source.ds - INTERVAL '7 day' + AND buy_source.product_id = v.product_id --Joining on the constant property product_id ``` From 6a32ba682ad541a89b71060a130ef1ea1d5f3979 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Fri, 19 Jan 2024 17:17:23 -0800 Subject: [PATCH 42/56] Update sl-graphql.md --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index fc095985160..9bef3c039f3 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -513,7 +513,7 @@ Assuming the user is querying `metric_0` and `metric_1` together, a valid filter Invalid filters would be: - * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query b are defined based on measures with different grains. + * ` "{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query are defined based on measures with different grains. * `"{{ TimeDimension('metric_time', 'month') }} > '2020-01-01'"` — `metric_1` is not available at a month grain. From 1d053634a3325ef1d6b1102e431de7f3bc01b240 Mon Sep 17 00:00:00 2001 From: rpourzand Date: Fri, 19 Jan 2024 17:18:33 -0800 Subject: [PATCH 43/56] Update sl-graphql.md --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 9bef3c039f3..230f84b0b98 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -468,7 +468,7 @@ mutation { } ``` -For both `TimeDimension()` and `Dimension()` objects, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. +For both `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. For example, consider this Semantic model and Metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. From c8f43c006c149612f060e1238371d004c981e099 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 22 Jan 2024 12:28:53 +0000 Subject: [PATCH 44/56] update code to update handleScroll --- website/src/components/detailsToggle/index.js | 56 ++++++++++++------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/website/src/components/detailsToggle/index.js b/website/src/components/detailsToggle/index.js index 076d053846c..6e4f3f7768c 100644 --- a/website/src/components/detailsToggle/index.js +++ b/website/src/components/detailsToggle/index.js @@ -3,33 +3,51 @@ import styles from './styles.module.css'; function detailsToggle({ children, alt_header = null }) { const [isOn, setOn] = useState(false); - const [hoverActive, setHoverActive] = useState(true); + const [isScrolling, setIsScrolling] = useState(false); // New state to track scrolling const [hoverTimeout, setHoverTimeout] = useState(null); const handleToggleClick = () => { - setHoverActive(true); // Disable hover when clicked setOn(current => !current); // Toggle the current state -}; - -const handleMouseEnter = () => { - if (isOn) return; // Ignore hover if already open - setHoverActive(true); // Enable hover - const timeout = setTimeout(() => { - if (hoverActive) setOn(true); - }, 500); - setHoverTimeout(timeout); -}; - -const handleMouseLeave = () => { - if (!isOn) { + }; + + const handleMouseEnter = () => { + if (isOn || isScrolling) return; // Ignore hover if already open or if scrolling + const timeout = setTimeout(() => { + if (!isScrolling) setOn(true); + }, 700); // + setHoverTimeout(timeout); + }; + + const handleMouseLeave = () => { + if (!isOn) { + clearTimeout(hoverTimeout); + setOn(false); + } + }; + + const handleScroll = () => { + if (isOn) { + setIsScrolling(true); clearTimeout(hoverTimeout); setOn(false); } -}; -useEffect(() => { - return () => clearTimeout(hoverTimeout); -}, [hoverTimeout]); + // Reset scrolling state after a delay + setTimeout(() => { + setIsScrolling(false); + }, 800); + }; + + useEffect(() => { + window.addEventListener('scroll', handleScroll); + return () => { + window.removeEventListener('scroll', handleScroll); + }; + }, []); + + useEffect(() => { + return () => clearTimeout(hoverTimeout); + }, [hoverTimeout]); return (
From c2cfcdd59b52e3d7e303ce32d93e67b055a8c816 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 22 Jan 2024 13:13:19 +0000 Subject: [PATCH 45/56] update --- website/src/components/detailsToggle/index.js | 59 ++++++++++++++----- .../detailsToggle/styles.module.css | 1 + 2 files changed, 44 insertions(+), 16 deletions(-) diff --git a/website/src/components/detailsToggle/index.js b/website/src/components/detailsToggle/index.js index 6e4f3f7768c..b88d955587c 100644 --- a/website/src/components/detailsToggle/index.js +++ b/website/src/components/detailsToggle/index.js @@ -1,41 +1,68 @@ import React, { useState, useEffect } from 'react'; import styles from './styles.module.css'; -function detailsToggle({ children, alt_header = null }) { +function DetailsToggle({ children, alt_header = null, index }) { const [isOn, setOn] = useState(false); - const [isScrolling, setIsScrolling] = useState(false); // New state to track scrolling + const [isScrolling, setIsScrolling] = useState(false); const [hoverTimeout, setHoverTimeout] = useState(null); const handleToggleClick = () => { - setOn(current => !current); // Toggle the current state + if (isOn) { + closeToggle(); + } else { + closePreviouslyOpenToggle(); + openToggle(); + } }; const handleMouseEnter = () => { - if (isOn || isScrolling) return; // Ignore hover if already open or if scrolling + if (isScrolling) return; const timeout = setTimeout(() => { - if (!isScrolling) setOn(true); - }, 700); // + closeToggle(); // close the toggle first + openToggle(); // then open it + }, 700); setHoverTimeout(timeout); + closePreviouslyOpenToggle(); }; - + const handleMouseLeave = () => { if (!isOn) { clearTimeout(hoverTimeout); - setOn(false); + closeToggle(); } }; const handleScroll = () => { - if (isOn) { - setIsScrolling(true); - clearTimeout(hoverTimeout); - setOn(false); - } + setIsScrolling(true); + clearTimeout(hoverTimeout); + + if (isOn) { + closeToggle(); + } - // Reset scrolling state after a delay setTimeout(() => { setIsScrolling(false); - }, 800); + }, 300); + }; + + const openToggle = () => { + setOn(true); + }; + + const closeToggle = () => { + setOn(false); + }; + + const closePreviouslyOpenToggle = () => { + const allToggles = document.querySelectorAll('.detailsToggle'); + allToggles.forEach((toggle, toggleIndex) => { + if (toggleIndex !== index) { + const toggleIsOpen = toggle.querySelector(`.${styles.body}`).style.display === 'block'; + if (toggleIsOpen) { + toggle.querySelector(`.${styles.body}`).style.display = 'none'; + } + } + }); }; useEffect(() => { @@ -72,4 +99,4 @@ function detailsToggle({ children, alt_header = null }) { ); } -export default detailsToggle; +export default DetailsToggle; diff --git a/website/src/components/detailsToggle/styles.module.css b/website/src/components/detailsToggle/styles.module.css index b9c2f09df06..b3f4a4886dc 100644 --- a/website/src/components/detailsToggle/styles.module.css +++ b/website/src/components/detailsToggle/styles.module.css @@ -27,6 +27,7 @@ width: 1.25rem; vertical-align: middle; transition: transform 0.3s; /* Smooth transition for toggle icon */ + } :local(.toggleUpsideDown) { From 61d73daa03e54ede4c5985d08db2848d4400045e Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Mon, 22 Jan 2024 15:53:52 +0000 Subject: [PATCH 46/56] update handlescroll --- website/src/components/detailsToggle/index.js | 53 +++++-------------- 1 file changed, 12 insertions(+), 41 deletions(-) diff --git a/website/src/components/detailsToggle/index.js b/website/src/components/detailsToggle/index.js index b88d955587c..fc51af35428 100644 --- a/website/src/components/detailsToggle/index.js +++ b/website/src/components/detailsToggle/index.js @@ -1,68 +1,39 @@ import React, { useState, useEffect } from 'react'; import styles from './styles.module.css'; -function DetailsToggle({ children, alt_header = null, index }) { +function detailsToggle({ children, alt_header = null }) { const [isOn, setOn] = useState(false); - const [isScrolling, setIsScrolling] = useState(false); + const [isScrolling, setIsScrolling] = useState(false); // New state to track scrolling const [hoverTimeout, setHoverTimeout] = useState(null); const handleToggleClick = () => { - if (isOn) { - closeToggle(); - } else { - closePreviouslyOpenToggle(); - openToggle(); - } + setOn(current => !current); // Toggle the current state }; const handleMouseEnter = () => { - if (isScrolling) return; + if (isOn || isScrolling) return; // Ignore hover if already open or if scrolling const timeout = setTimeout(() => { - closeToggle(); // close the toggle first - openToggle(); // then open it - }, 700); + if (!isScrolling) setOn(true); + }, 700); // setHoverTimeout(timeout); - closePreviouslyOpenToggle(); }; - + const handleMouseLeave = () => { if (!isOn) { clearTimeout(hoverTimeout); - closeToggle(); + setOn(false); } }; const handleScroll = () => { setIsScrolling(true); clearTimeout(hoverTimeout); + //setOn(false); - if (isOn) { - closeToggle(); - } - + // Reset scrolling state after a delay setTimeout(() => { setIsScrolling(false); - }, 300); - }; - - const openToggle = () => { - setOn(true); - }; - - const closeToggle = () => { - setOn(false); - }; - - const closePreviouslyOpenToggle = () => { - const allToggles = document.querySelectorAll('.detailsToggle'); - allToggles.forEach((toggle, toggleIndex) => { - if (toggleIndex !== index) { - const toggleIsOpen = toggle.querySelector(`.${styles.body}`).style.display === 'block'; - if (toggleIsOpen) { - toggle.querySelector(`.${styles.body}`).style.display = 'none'; - } - } - }); + }, 800); }; useEffect(() => { @@ -99,4 +70,4 @@ function DetailsToggle({ children, alt_header = null, index }) { ); } -export default DetailsToggle; +export default detailsToggle; From 6bb0e87d080b782f4c744db47d283da8ba606c2b Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Mon, 22 Jan 2024 12:37:00 -0800 Subject: [PATCH 47/56] Update semantic-models.md --- website/docs/docs/build/semantic-models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index 5c6883cdcee..afb877db504 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -20,7 +20,7 @@ Semantic models are the foundation for data definition in MetricFlow, which powe -Semantic models have 6 components and this page explains the definitions with some examples: +Here we describe the Semantic model components with examples: | Component | Description | Type | | --------- | ----------- | ---- | From c18a17231444c62572e7b1cba86a4d09f3dae203 Mon Sep 17 00:00:00 2001 From: Ly Nguyen <107218380+nghi-ly@users.noreply.github.com> Date: Mon, 22 Jan 2024 14:21:59 -0800 Subject: [PATCH 48/56] Update website/docs/docs/deploy/ci-jobs.md Co-authored-by: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/deploy/ci-jobs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index d449f043470..9b96bb4b766 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -16,7 +16,7 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy - You have a dbt Cloud account. - For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/). - Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering. - - If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account which includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). With GitLab Free, merge requests will invoke CI jobs but CI status updates (success or failure of the job) will not be reported back to GitLab. + - If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab. To make CI job creation easier, many options on the **CI job** page are set to default values that dbt Labs recommends that you use. If you don't want to use the defaults, you can change them. From 4596048dbb61765b05fd66aa649de9ed47882406 Mon Sep 17 00:00:00 2001 From: Ly Nguyen Date: Mon, 22 Jan 2024 14:34:17 -0800 Subject: [PATCH 49/56] Feedback --- website/docs/docs/deploy/continuous-integration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/continuous-integration.md b/website/docs/docs/deploy/continuous-integration.md index ffe8d0c326b..22686c44bd2 100644 --- a/website/docs/docs/deploy/continuous-integration.md +++ b/website/docs/docs/deploy/continuous-integration.md @@ -52,4 +52,4 @@ When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the late ### Run slot treatment -CI runs for accounts on the [Enterprise, Team](https://www.getdbt.com/pricing), and [Free (trial)](https://www.getdbt.com/signup) plans don't consume run slots so a CI check will never block a production run. \ No newline at end of file +For accounts on the [Enterprise or Team](https://www.getdbt.com/pricing) plans, CI runs won't consume run slots. This guarantees a CI check will never block a production run. \ No newline at end of file From 357e5aa2e3489305f2f759638d9c18716ac5d50b Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:26:05 +0000 Subject: [PATCH 50/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 7296d1541ab..7b2b1438ae9 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -378,7 +378,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], }} ``` -If you are ordering by an object that's been operated on (e.g., you changed the the granularity of the time dimension), or you are using the full object notation, descending order must look like: +If you are ordering by an object that's been operated on (for example, you changed the the granularity of the time dimension), or you are using the full object notation, descending order must look like: ```bash select * from {{ From 192c2f01be98826adebacd7dd2627763796fc75a Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:27:47 +0000 Subject: [PATCH 51/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 7b2b1438ae9..6ff846fa7c7 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -315,7 +315,7 @@ Assuming the user is querying `metric_0` and `metric_1` together in a single req * `"{{ TimeDimension('metric_time', 'year') }} > '2020-01-01'"` -Invalid Filters would be: +Invalid filters would be: * `"{{ TimeDimension('metric_time') }} > '2020-01-01'"` — metrics in the query are defined based on measures with different grains. From 250b3ba78d8cb759089a0fa3bc045ae88dd8e587 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:28:47 +0000 Subject: [PATCH 52/56] Update website/docs/docs/dbt-cloud-apis/sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 6ff846fa7c7..6916f283898 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -415,7 +415,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], ## FAQs -
+ When you select a dimension on its own, such as `metric_time` you can use the shorthand method which doesn't need the “Dimension” syntax. From fb3b8f69b3f762e16736aed47bf1d5fb8111bf33 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:30:05 +0000 Subject: [PATCH 53/56] Update website/docs/docs/dbt-cloud-apis/sl-graphql.md --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 7a857f45087..177883ce19b 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -481,7 +481,7 @@ mutation { } ``` -For both `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures & metrics associated with the where filter have different grains. +For both `TimeDimension()`, the grain is only required in the WHERE filter if the aggregation time dimensions for the measures and metrics associated with the where filter have different grains. For example, consider this Semantic model and Metric configuration, which contains two metrics that are aggregated across different time grains. This example shows a single semantic model, but the same goes for metrics across more than one semantic model. From 610da3815e16fef976b47ff3d4b39b4b2980b154 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:31:04 +0000 Subject: [PATCH 54/56] Update sl-graphql.md --- website/docs/docs/dbt-cloud-apis/sl-graphql.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 177883ce19b..f26a19a1930 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -466,7 +466,7 @@ The `where` filter takes a list argument (or a string for a single input). Depen - `Entity()` — Used for entities like primary and foreign keys, such as `Entity('order_id')`. -Note: If you prefer a `where` clause with a more explicit path, you can optionally use `TimeDimension()` to separate out categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. +Note: If you prefer a `where` clause with a more explicit path, you can optionally use `TimeDimension()` to separate categorical dimensions from time ones. The `TimeDimension` input takes the time dimension and optionally the granularity level. `TimeDimension('metric_time', 'month')`. ```graphql mutation { From 7f3cc44c400ac2b56c265e4fb7a4d4292d6efa08 Mon Sep 17 00:00:00 2001 From: mirnawong1 <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 23 Jan 2024 11:33:20 +0000 Subject: [PATCH 55/56] Update sl-jdbc.md --- website/docs/docs/dbt-cloud-apis/sl-jdbc.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md index 6916f283898..5ef0c071c10 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-jdbc.md +++ b/website/docs/docs/dbt-cloud-apis/sl-jdbc.md @@ -92,9 +92,9 @@ select * from {{ -Use this query to fetch dimension values for one or multiple metrics and single dimension. +Use this query to fetch dimension values for one or multiple metrics and a single dimension. -Note, `metrics` is a required argument that lists one or multiple metrics in it, and a single dimension. +Note, `metrics` is a required argument that lists one or multiple metrics, and a single dimension. ```bash select * from {{ @@ -105,9 +105,9 @@ semantic_layer.dimension_values(metrics=['food_order_amount'], group_by=['custom -Use this query to fetch queryable granularities for a list of metrics. This API request allows you to only show the time granularities that make sense for the primary time dimension of the metrics (such as `metric_time`), but if you want queryable granularities for other time dimensions, you can use the `dimensions()` call, and find the column queryable_granularities. +You can use this query to fetch queryable granularities for a list of metrics. This API request allows you to only show the time granularities that make sense for the primary time dimension of the metrics (such as `metric_time`), but if you want queryable granularities for other time dimensions, you can use the `dimensions()` call, and find the column queryable_granularities. -Note, `metrics` is a required argument that lists one or multiple metrics in it. +Note, `metrics` is a required argument that lists one or multiple metrics. ```bash select * from {{ @@ -124,7 +124,7 @@ select * from {{ Use this query to fetch available metrics given dimensions. This command is essentially the opposite of getting dimensions given a list of metrics. -Note, `group_by` is a required argument that lists one or multiple dimensions in it. +Note, `group_by` is a required argument that lists one or multiple dimensions. ```bash select * from {{ @@ -137,7 +137,7 @@ select * from {{ -Use this example query to fetch available granularities for all time dimesensions (the similar queryable granularities API call only returns granularities for the primary time dimensions for metrics). The following call is a derivative of the `dimensions()` call and specifically selects the granularities field. +You can use this example query to fetch available granularities for all time dimensions (the similar queryable granularities API call only returns granularities for the primary time dimensions for metrics). The following call is a derivative of the `dimensions()` call and specifically selects the granularity field. ```bash select NAME, QUERYABLE_GRANULARITIES from {{ @@ -344,7 +344,7 @@ where=["{{ Dimension('metric_time').grain('month') }} >= '2017-03-09'", "{{ Dime ### Query with a limit -Use the following example to query using a `limit` or `order_by` clauses: +Use the following example to query using a `limit` or `order_by` clause: ```bash select * from {{ @@ -356,7 +356,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], ### Query with Order By Examples -Order By can take a basic string that's a Dimension, Metric, or Entity and this will default to ascending order +Order By can take a basic string that's a Dimension, Metric, or Entity, and this will default to ascending order ```bash select * from {{ @@ -367,7 +367,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], }} ``` -For descending order, you can add a `-` sign in front of the object. However, you can only use this short hand notation if you aren't operating on the object or using the full object notation. +For descending order, you can add a `-` sign in front of the object. However, you can only use this short-hand notation if you aren't operating on the object or using the full object notation. ```bash select * from {{ @@ -378,7 +378,7 @@ semantic_layer.query(metrics=['food_order_amount', 'order_gross_profit'], }} ``` -If you are ordering by an object that's been operated on (for example, you changed the the granularity of the time dimension), or you are using the full object notation, descending order must look like: +If you are ordering by an object that's been operated on (for example, you changed the granularity of the time dimension), or you are using the full object notation, descending order must look like: ```bash select * from {{ From 7fef1110ba5a6681a46ea47b5ba5cc713cedbeec Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 23 Jan 2024 11:58:39 +0000 Subject: [PATCH 56/56] update sql --- website/docs/docs/build/conversion-metrics.md | 160 ++++++++++-------- 1 file changed, 85 insertions(+), 75 deletions(-) diff --git a/website/docs/docs/build/conversion-metrics.md b/website/docs/docs/build/conversion-metrics.md index 8d5f7eb24bf..5b63a6bbbf1 100644 --- a/website/docs/docs/build/conversion-metrics.md +++ b/website/docs/docs/build/conversion-metrics.md @@ -105,18 +105,19 @@ This step joins the `BUYS` table to the `VISITS` table and gets all combinations The SQL generated in these steps looks like the following: ```sql -SELECT v.ds, - v.user_id, - v.referrer_id, - b.ds, - b.uuid, - 1 as buys -FROM visits v - INNER JOIN (SLECT *, uuid_string() as uuid - FROM buys -- Adds a uuid column to uniquely identify the different rows - ) b - ON - v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' +select + v.ds, + v.user_id, + v.referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys -- Adds a uuid column to uniquely identify the different rows +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 days' ``` The dataset returns the following (note that there are two potential conversion events for the first visit): @@ -133,18 +134,19 @@ The dataset returns the following (note that there are two potential conversion Instead of returning the raw visit values, use window functions to link conversions to the closest base event. You can partition by the conversion source and get the `first_value` ordered by `visit ds`, descending to get the closest base event from the conversion event: ```sql -SELECT first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, - first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, - first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, - b.ds, - b.uuid, - 1 as buys -FROM visits v - INNER JOIN (select *, uuid_string() as uuid - from buys) b - ON - v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' - +select + first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' ``` The dataset returns the following: @@ -165,18 +167,19 @@ To resolve this and eliminate duplicates, use a distinct select. The UUID also h Instead of regular select used in the [Step 2](#step-2-refine-with-window-function), use a distinct select to remove the duplicates: ```sql -SELECT DISTINCT first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, - first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, - first_value(v.referrer_id) - over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, - b.ds, - b.uuid, - 1 as buys -FROM visits v - INNER JOIN (select *, uuid_string() as uuid - from buys) b - ON - v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; +select distinct + first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; ``` The dataset returns the following: @@ -197,28 +200,38 @@ You now have a dataset where every conversion is connected to a visit event. To Now that you’ve tied each conversion event to a visit, you can calculate the aggregated conversions and opportunities measures. Then, you can join them to calculate the actual conversion rate. The SQL to calculate the conversion rate is as follows: ```sql -SELECT coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, - cast(max(subq_13.buys) as double) / - cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d -FROM ( -- base measure - SELECT metric_time__day, - sum(visits) as mqls - FROM (SELECT date_trunc('day', first_contact_date) as metric_time__day, - 1 as visits - FROM visits) subq_2 - GROUP BY metric_time__day) subq_3 - FULL OUTER JOIN ( -- conversion measure - SELECT metric_time__day, - sum(buys) as sellers - FROM ( - -- ... - -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. - -- To see the full SQL output, add --explain to your conversion metric query. - ) subq_10 - GROUP BY metric_time__day) subq_13 - ON - subq_3.metric_time__day = subq_13.metric_time__day -GROUP BY metric_time__day +select + coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, + cast(max(subq_13.buys) as double) / cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d +from ( -- base measure + select + metric_time__day, + sum(visits) as mqls + from ( + select + date_trunc('day', first_contact_date) as metric_time__day, + 1 as visits + from visits + ) subq_2 + group by + metric_time__day +) subq_3 +full outer join ( -- conversion measure + select + metric_time__day, + sum(buys) as sellers + from ( + -- ... + -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. + -- To see the full SQL output, add --explain to your conversion metric query. + ) subq_10 + group by + metric_time__day +) subq_13 +on + subq_3.metric_time__day = subq_13.metric_time__day +group by + metric_time__day ``` ### Additional settings @@ -315,25 +328,22 @@ In this case, you want to set `product_id` as the constant property. You can spe You will add an additional condition to the join to make sure the constant property is the same across conversions. ```sql -SELECT DISTINCT first_value(v.ds) - over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, - first_value(v.user_id) - over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, - first_value(v.referrer_id) - over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, - buy_source.uuid, - 1 as buys -FROM {{ source_schema }}.fct_view_item_details v -INNER JOIN +select distinct + first_value(v.ds) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, + first_value(v.user_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, + first_value(v.referrer_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, + buy_source.uuid, + 1 as buys +from {{ source_schema }}.fct_view_item_details v +inner join ( - SELECT *, {{ generate_random_uuid() }} as uuid FROM {{ source_schema }}.fct_purchases + select *, {{ generate_random_uuid() }} as uuid from {{ source_schema }}.fct_purchases ) buy_source -ON - v.user_id = buy_source.user_id - AND v.ds <= buy_source.ds - AND v.ds > buy_source.ds - INTERVAL '7 day' - AND buy_source.product_id = v.product_id --Joining on the constant property product_id - +on + v.user_id = buy_source.user_id + and v.ds <= buy_source.ds + and v.ds > buy_source.ds - interval '7 day' + and buy_source.product_id = v.product_id --Joining on the constant property product_id ```