From d58f48c5cc00119f175468b17c488917d5cc1e23 Mon Sep 17 00:00:00 2001 From: Sultan Al-Sultan Date: Thu, 20 Jun 2024 12:57:24 +0300 Subject: [PATCH 1/6] Update spark-setup.md (#5676) Updating caveats section in `/connect-data-platform/spark-setup` to include a caveat when using thrift connection method with cluster that doesn't have a `default` namespace. For more context check: https://github.com/dbt-labs/dbt-spark/issues/1036 --------- Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- .../docs/core/connect-data-platform/spark-setup.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/website/docs/docs/core/connect-data-platform/spark-setup.md b/website/docs/docs/core/connect-data-platform/spark-setup.md index 9d9e0c9d5fb..3b1429c246b 100644 --- a/website/docs/docs/core/connect-data-platform/spark-setup.md +++ b/website/docs/docs/core/connect-data-platform/spark-setup.md @@ -48,7 +48,7 @@ $ python -m pip install "dbt-spark[session]"

For further info, refer to the GitHub repository: {frontMatter.meta.github_repo}

-## Connection Methods +## Connection methods dbt-spark can connect to Spark clusters by four different methods: @@ -211,7 +211,7 @@ Spark can be customized using [Application Properties](https://spark.apache.org/ ### Usage with EMR To connect to Apache Spark running on an Amazon EMR cluster, you will need to run `sudo /usr/lib/spark/sbin/start-thriftserver.sh` on the master node of the cluster to start the Thrift server (see [the docs](https://aws.amazon.com/premiumsupport/knowledge-center/jdbc-connection-emr/) for more information). You will also need to connect to port 10001, which will connect to the Spark backend Thrift server; port 10000 will instead connect to a Hive backend, which will not work correctly with dbt. -### Supported Functionality +### Supported functionality Most dbt Core functionality is supported, but some features are only available on Delta Lake (Databricks). @@ -220,3 +220,9 @@ Delta-only features: 1. Incremental model updates by `unique_key` instead of `partition_by` (see [`merge` strategy](/reference/resource-configs/spark-configs#the-merge-strategy)) 2. [Snapshots](/docs/build/snapshots) 3. [Persisting](/reference/resource-configs/persist_docs) column-level descriptions as database comments + +### Default namespace with Thrift connection method + +If your Spark cluster doesn't have a default namespace, metadata queries that run before any dbt workflow will fail, causing the entire workflow to fail, even if your configurations are correct. The metadata queries fail there's no default namespace in which to run it. + +To debug, review the debug-level logs to confirm the query dbt is running when it encounters the error: `dbt run --debug` or `logs/dbt.log`. From 6c4682a7eaece9f8a6d01a62878451bf86b38c9c Mon Sep 17 00:00:00 2001 From: "Vijayakumar Z @ 'Z" Date: Thu, 20 Jun 2024 15:45:31 +0530 Subject: [PATCH 2/6] Update incremental-models.md (#5306) If event_time is NULL or the table is truncated, the condition will always be true and load all records. This is an edge case. If a final table is truncated it will return the NULL as the result ( select * from table where event_time > NULL ) --> return no results It will not have any data in it and dbt considers it as incremental load. then the model will fail, to avoid that situation. In dbt we don't have anything to handle this situation. ## What are you changing in this pull request and why? ## Checklist - [ ] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding or removing pages (delete if not applicable): - [ ] Add/remove page in `website/sidebars.js` - [ ] Provide a unique filename for new pages - [ ] Add an entry for deleted pages in `website/vercel.json` - [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages --------- Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/build/incremental-models.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 5a0900b4cdd..21215437d89 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -70,7 +70,8 @@ from {{ ref('app_data_events') }} -- this filter will only be applied on an incremental run -- (uses >= to include records whose timestamp occurred since the last run of this model) - where event_time >= (select coalesce(max(event_time), '1900-01-01') from {{ this }}) + -- (If event_time is NULL or the table is truncated, the condition will always be true and load all records) +where event_time >= (select coalesce(max(event_time),'1900-01-01'::TIMESTAMP) from {{ this }} ) {% endif %} ``` From 0b5812385a4cbc58b1fe681d997fbf1f2c35959a Mon Sep 17 00:00:00 2001 From: ialdg <39755524+ialdg@users.noreply.github.com> Date: Thu, 20 Jun 2024 12:23:56 +0200 Subject: [PATCH 3/6] Update deploy-jobs.md (#5183) Hi. This proposal of modification has several broken, misspelled or missing link comments for you to consider. Regards. IL. ## What are you changing in this pull request and why? ## Checklist - [x] Review the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] For [docs versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning), review how to [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content). - [ ] Add a checklist item for anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." Adding or removing pages (delete if not applicable): - [ ] Add/remove page in `website/sidebars.js` - [ ] Provide a unique filename for new pages - [ ] Add an entry for deleted pages in `website/vercel.json` - [ ] Run link testing locally with `npm run build` to update the links that point to deleted pages --------- Co-authored-by: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> --- website/docs/docs/deploy/deploy-jobs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/deploy/deploy-jobs.md b/website/docs/docs/deploy/deploy-jobs.md index 7a59336736d..96ec8a1932e 100644 --- a/website/docs/docs/deploy/deploy-jobs.md +++ b/website/docs/docs/deploy/deploy-jobs.md @@ -10,7 +10,7 @@ You can use deploy jobs to build production data assets. Deploy jobs make it eas - Commit SHA - Environment name - Sources and documentation info, if applicable -- Job run details, including run timing, [model timing data](#model-timing), and [artifacts](/docs/deploy/artifacts) +- Job run details, including run timing, [model timing data](/docs/deploy/run-visibility#model-timing), and [artifacts](/docs/deploy/artifacts) - Detailed run steps with logs and their run step statuses You can create a deploy job and configure it to run on [scheduled days and times](#schedule-days) or enter a [custom cron schedule](#cron-schedule). From 2896787514d38e6703e3b9fed468fcc0a9751786 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Thu, 20 Jun 2024 17:37:50 +0100 Subject: [PATCH 4/6] add cookbook recipe content type (#5657) this pr adds a new content type: dbt cookbook recipes. dbt cookbook recipes provide users with practical, real-life examples and solutions based on what they're looking to do. some content suggestions inspired by @gwenwindflower --- contributing/content-types.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/contributing/content-types.md b/contributing/content-types.md index 4654ada9255..38a3141ecb3 100644 --- a/contributing/content-types.md +++ b/contributing/content-types.md @@ -9,6 +9,7 @@ These content types can all form articles. Some content types can form sections * [Procedural](#procedural) * [Guide](#guide) * [Quickstart](#quickstart-guide) +* [Cookbook recipes](#cookbook-recipes) ## Conceptual @@ -165,3 +166,26 @@ Quickstart guides are generally more conversational in tone than our other docum Examples TBD + +## Cookbook recipes +The dbt Cookbook recipes are a collection of scenario-based, real-world examples for building with the dbt. Cookbook recipes offer practical, scenario-based examples for using specific features. + +Code examples could be written in SQL or [Python](/docs/build/python-models), though most will be in SQL. + +If there are examples or guides you'd like to see, feel free to suggest them on the [documentation issues page](https://github.com/dbt-labs/docs.getdbt.com/issues/new/choose). We're also happy to accept high-quality pull requests, as long as they fit the scope of the cookbook. + +### Contents of a cookbook recipe article or header + +Cookbook recipes should contain real-life scenarios with objectives, prerequisites, detailed steps, code snippets, and outcomes — providing users with a dedicated section to implement solutions based on their needs. Cookbook recipes complement the existing guides by giving users hands-on, actionable instructions and code. + +Each cookbook recipe should include objectives, a clear use case, prerequisites, step-by-step instructions, code snippets, expected output, and troubleshooting tips. + +### Titles for cookbook recipe content + +Cookbook recipe headers should always start with a “How to create [topic]” or "How to [verb] [topic]". + +### Examples of cookbook recipe content + +- How to calculate annual recurring revenue (ARR) using metrics in dbt +- How to calculate customer acquisition cost (CAC) using metrics in dbt +- How to track the total number of sale transactions using metrics in dbt From f0be420cb4b4200498aefce3cd722ac7776d28dc Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Thu, 20 Jun 2024 12:37:23 -0700 Subject: [PATCH 5/6] Update add_issue_to_project.yml --- .github/workflows/add_issue_to_project.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/add_issue_to_project.yml b/.github/workflows/add_issue_to_project.yml index 38a9eedc816..7972704ff70 100644 --- a/.github/workflows/add_issue_to_project.yml +++ b/.github/workflows/add_issue_to_project.yml @@ -12,4 +12,4 @@ jobs: - uses: actions/add-to-project@v0.5.0 with: project-url: https://github.com/orgs/dbt-labs/projects/14 - github-token: ${{ secrets.DOCS_SECRET }} + github-token: ${{ secrets.GITHUB_TOKEN }} From 5462ebb85f43e62c74033b00b9dc7c2abbc485c3 Mon Sep 17 00:00:00 2001 From: "Leona B. Campbell" <3880403+runleonarun@users.noreply.github.com> Date: Thu, 20 Jun 2024 13:15:32 -0700 Subject: [PATCH 6/6] Update add_issue_to_project.yml --- .github/workflows/add_issue_to_project.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/add_issue_to_project.yml b/.github/workflows/add_issue_to_project.yml index 7972704ff70..38a9eedc816 100644 --- a/.github/workflows/add_issue_to_project.yml +++ b/.github/workflows/add_issue_to_project.yml @@ -12,4 +12,4 @@ jobs: - uses: actions/add-to-project@v0.5.0 with: project-url: https://github.com/orgs/dbt-labs/projects/14 - github-token: ${{ secrets.GITHUB_TOKEN }} + github-token: ${{ secrets.DOCS_SECRET }}