Skip to content

Commit

Permalink
dbt-teradata 1.8.0 changes (#5677)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?
Made changes in the setup and config pages w.r.t the dbt-teradata 1.8.0.

---------

Co-authored-by: Talla <[email protected]>
Co-authored-by: Leona B. Campbell <[email protected]>
Co-authored-by: Mirna Wong <[email protected]>
Co-authored-by: Anders <[email protected]>
  • Loading branch information
5 people authored Jul 2, 2024
1 parent f6a9e83 commit d540936
Show file tree
Hide file tree
Showing 2 changed files with 148 additions and 5 deletions.
22 changes: 18 additions & 4 deletions website/docs/docs/core/connect-data-platform/teradata-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,14 +39,16 @@ import SetUpPages from '/snippets/_setup-pages-intro.md';
|1.5.x | ❌ | ✅ | ✅ | ✅ | ✅ | ✅
|1.6.x | ❌ | ❌ | ✅ | ✅ | ✅ | ✅
|1.7.x | ❌ | ❌ | ✅ | ✅ | ✅ | ✅
|1.8.x | ❌ | ❌ | ✅ | ✅ | ✅ | ✅

## dbt dependent packages version compatibility

| dbt-teradata | dbt-core | dbt-teradata-util | dbt-util |
|--------------|------------|-------------------|----------------|
| 1.2.x | 1.2.x | 0.1.0 | 0.9.x or below |
| 1.6.7 | 1.6.7 | 1.1.1 | 1.1.1 |
| 1.7.1 | 1.7.3 | 1.1.1 | 1.1.1 |
| 1.7.x | 1.7.x | 1.1.1 | 1.1.1 |
| 1.8.x | 1.8.x | 1.1.1 | 1.1.1 |


### Connecting to Teradata
Expand Down Expand Up @@ -124,10 +126,11 @@ Parameter | Default | Type | Description
`sslmode` | `"PREFER"` | string | Specifies the mode for connections to the database. Equivalent to the Teradata JDBC Driver `SSLMODE` connection parameter.<br/>&bull; `DISABLE` disables HTTPS/TLS connections and uses only non-TLS connections.<br/>&bull; `ALLOW` uses non-TLS connections unless the database requires HTTPS/TLS connections.<br/>&bull; `PREFER` uses HTTPS/TLS connections unless the database does not offer HTTPS/TLS connections.<br/>&bull; `REQUIRE` uses only HTTPS/TLS connections.<br/>&bull; `VERIFY-CA` uses only HTTPS/TLS connections and verifies that the server certificate is valid and trusted.<br/>&bull; `VERIFY-FULL` uses only HTTPS/TLS connections, verifies that the server certificate is valid and trusted, and verifies that the server certificate matches the database hostname.
`sslprotocol` | `"TLSv1.2"` | string | Specifies the TLS protocol for HTTPS/TLS connections. Equivalent to the Teradata JDBC Driver `SSLPROTOCOL` connection parameter.
`teradata_values` | `"true"` | quoted boolean | Controls whether `str` or a more specific Python data type is used for certain result set column value types.
`query_band` | `"org=teradata-internal-telem;appname=dbt;"` | string | Specifies the Query Band string to be set for each SQL request.

For the full description of the connection parameters see https://github.com/Teradata/python-driver#connection-parameters.
Refer to [connection parameters](https://github.com/Teradata/python-driver#connection-parameters) for the full description of the connection parameters.

## Supported Features
## Supported features

### Materializations

Expand All @@ -141,8 +144,12 @@ The following incremental materialization strategies are supported:
* `append` (default)
* `delete+insert`
* `merge`
* `valid_history` (early access)

To learn more about dbt incremental strategies please check [the dbt incremental strategy documentation](/docs/build/incremental-strategy).
:::info
- To learn more about dbt incremental strategies, refer to [the dbt incremental strategy documentation](/docs/build/incremental-strategy).
- To learn more about `valid_history` incremental strategy, refer to [Teradata configs](/reference/resource-configs/teradata-configs).
:::

### Commands

Expand Down Expand Up @@ -225,6 +232,13 @@ For using cross-DB macros, teradata-utils as a macro namespace will not be used,

`last_day` in `teradata_utils`, unlike the corresponding macro in `dbt_utils`, doesn't support `quarter` datepart.

<VersionBlock firstVersion="1.8">

dbt-teradata 1.8.0 and later versions support unit tests, enabling you to validate SQL models and logic with a small set of static inputs before going to production. This feature enhances test-driven development and boosts developer efficiency and code reliability. Learn more about dbt unit tests [here](/docs/build/unit-tests).


</VersionBlock>

## Limitations

### Transaction mode
Expand Down
131 changes: 130 additions & 1 deletion website/docs/reference/resource-configs/teradata-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,29 @@ Loading CSVs using dbt's seed functionality is not performant for large files. C
+use_fastload: true
```

## Snapshots

Snapshots use the [HASHROW function](https://docs.teradata.com/r/Enterprise_IntelliFlex_VMware/SQL-Functions-Expressions-and-Predicates/Hash-Related-Functions/HASHROW/HASHROW-Function-Syntax) of the Teradata database to generate a unique hash value for the `dbt_scd_id` column.

To use your own hash UDF, there is a configuration option in the snapshot model called `snapshot_hash_udf`, which defaults to HASHROW. You can provide a value like `<database_name.hash_udf_name>`. If you only provide `hash_udf_name`, it uses the same schema as the model runs.

For example, in the `snapshots/snapshot_example.sql` file:

```sql
{% snapshot snapshot_example %}
{{
config(
target_schema='snapshots',
unique_key='id',
strategy='check',
check_cols=["c2"],
snapshot_hash_udf='GLOBAL_FUNCTIONS.hash_md5'
)
}}
select * from {{ ref('order_payments') }}
{% endsnapshot %}
```

#### Grants

Grants are supported in dbt-teradata adapter with release version 1.2.0 and above. You can use grants to manage access to the datasets you're producing with dbt. To implement these permissions, define grants as resource configs on each model, seed, or snapshot. Define the default grants that apply to the entire project in your `dbt_project.yml`, and define model-specific grants within each model's SQL or YAML file.
Expand Down Expand Up @@ -289,7 +312,113 @@ Another e.g. for adding multiple grants:
```
> :information_source: `copy_grants` is not supported in Teradata.

More on Grants can be found at https://docs.getdbt.com/reference/resource-configs/grants
Refer to [grants](/reference/resource-configs/grants) for more information on Grants.

## Query Band
Query Band in dbt-teradata can be set on three levels:
1. Profiles level: In the `profiles.yml` file, the user can provide `query_band` using the following example:

```yaml
query_band: 'application=dbt;'
```

2. Project level: In the `dbt_project.yml` file, the user can provide `query_band` using the following example:

```yaml
models:
Project_name:
+query_band: "app=dbt;model={model};"
```
4. Model level: It can be set on the model SQL file or model level configuration on YAML files:

```sql
{{ config( query_band='sql={model};' ) }}
```

Users can set `query_band` at any level or on all levels. With profiles-level `query_band`, dbt-teradata will set the `query_band` for the first time for the session, and subsequently for model and project level query band will be updated with respective configuration.

If a user sets some key-value pair with value as `'{model}'`, internally this `'{model}'` will be replaced with model name, which can be useful for telemetry tracking of sql/ dbql logging.

```yaml
models:
Project_name:
+query_band: "app=dbt;model={model};"
````
- For example, if the model the user is running is `stg_orders`, `{model}` will be replaced with `stg_orders` in runtime.
- If no `query_band` is set by the user, the default query_band used will be: ```org=teradata-internal-telem;appname=dbt;```

## valid_history incremental materialization strategy
_This is available in early access_

This strategy is designed to manage historical data efficiently within a Teradata environment, leveraging dbt features to ensure data quality and optimal resource usage.
In temporal databases, valid time is crucial for applications like historical reporting, ML training datasets, and forensic analysis.

```yaml
{{
config(
materialized='incremental',
unique_key='id',
on_schema_change='fail',
incremental_strategy='valid_history',
valid_from='valid_from_column',
history_column_in_target='history_period_column'
)
}}
```

The `valid_history` incremental strategy requires the following parameters:
* `valid_from` &mdash; Column in the source table of **timestamp** datatype indicating when each record became valid.
* `history_column_in_target` &mdash; Column in the target table of **period** datatype that tracks history.

The valid_history strategy in dbt-teradata involves several critical steps to ensure the integrity and accuracy of historical data management:
* Remove duplicates and conflicting values from the source data:
* This step ensures that the data is clean and ready for further processing by eliminating any redundant or conflicting records.
* The process of removing duplicates and conflicting values from the source data involves using a ranking mechanism to ensure that only the highest-priority records are retained. This is accomplished using the SQL RANK() function.
* Identify and adjust overlapping time slices:
* Overlapping time periods in the data are detected and corrected to maintain a consistent and non-overlapping timeline.
* Manage records needing to be overwritten or split based on the source and target data:
* This involves handling scenarios where records in the source data overlap with or need to replace records in the target data, ensuring that the historical timeline remains accurate.
* Utilize the TD_NORMALIZE_MEET function to compact history:
* This function helps to normalize and compact the history by merging adjacent time periods, improving the efficiency and performance of the database.
* Delete existing overlapping records from the target table:
* Before inserting new or updated records, any existing records in the target table that overlap with the new data are removed to prevent conflicts.
* Insert the processed data into the target table:
* Finally, the cleaned and adjusted data is inserted into the target table, ensuring that the historical data is up-to-date and accurately reflects the intended timeline.


These steps collectively ensure that the valid_history strategy effectively manages historical data, maintaining its integrity and accuracy while optimizing performance.

```sql
An illustration demonstrating the source sample data and its corresponding target data:
-- Source data
pk | valid_from | value_txt1 | value_txt2
======================================================================
1 | 2024-03-01 00:00:00.0000 | A | x1
1 | 2024-03-12 00:00:00.0000 | B | x1
1 | 2024-03-12 00:00:00.0000 | B | x2
1 | 2024-03-25 00:00:00.0000 | A | x2
2 | 2024-03-01 00:00:00.0000 | A | x1
2 | 2024-03-12 00:00:00.0000 | C | x1
2 | 2024-03-12 00:00:00.0000 | D | x1
2 | 2024-03-13 00:00:00.0000 | C | x1
2 | 2024-03-14 00:00:00.0000 | C | x1
-- Target data
pk | valid_period | value_txt1 | value_txt2
===================================================================================================
1 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1
1 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 2024-03-25 00:00:00.0] | B | x1
1 | PERIOD(TIMESTAMP)[2024-03-25 00:00:00.0, 9999-12-31 23:59:59.9999] | A | x2
2 | PERIOD(TIMESTAMP)[2024-03-01 00:00:00.0, 2024-03-12 00:00:00.0] | A | x1
2 | PERIOD(TIMESTAMP)[2024-03-12 00:00:00.0, 9999-12-31 23:59:59.9999] | C | x1
```


:::info
The target table must already exist before running the model. Ensure the target table is created and properly structured with the necessary columns, including a column that tracks the history with period datatype, before running a dbt model.
:::

## Common Teradata-specific tasks
* *collect statistics* - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using `COLLECT STATISTICS` command. You can perform this step using dbt's `post-hooks`, e.g.:
Expand Down

0 comments on commit d540936

Please sign in to comment.