Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc][2024.2.1] CDC: Update docs for longer data retention #25291

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ CDC retains resources (such as WAL segments) that contain information related to

Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits.

Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag to control the duration for which resources are retained.
Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained.

Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification.

Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours.
siddharth2411 marked this conversation as resolved.
Show resolved Hide resolved

{{< warning title="Important" >}}
When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows.

The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance.
{{< /warning >}}
Original file line number Diff line number Diff line change
Expand Up @@ -528,12 +528,16 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar

## Retaining data for longer durations

To increase retention of data for CDC, change the two flags, `cdc_intent_retention_ms` and `cdc_wal_retention_time_secs` as required.
The following flags are responsible for retention of data required by CDC:
- `cdc_wal_retention_time_secs` (default value: 28800s)
- `cdc_intent_retention_ms` (default value: 28800000ms)

{{< warning title="Important" >}}
Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours.

Longer values of `cdc_intent_retention_ms`, coupled with longer CDC lags (periods of downtime where the client is not requesting changes) can result in increased memory footprint in the YB-TServer and affect read performance.
{{< warning title="Important" >}}
When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows.

The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance.
{{< /warning >}}

## Content-based routing
Expand Down
2 changes: 1 addition & 1 deletion docs/content/preview/reference/configuration/yb-master.md
Original file line number Diff line number Diff line change
Expand Up @@ -885,7 +885,7 @@ Default: `0` (Use the same default number of tablets as for regular tables.)

WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK.

Default: `14400` (4 hours)
Default: `28800` (8 hours)

##### --enable_tablet_split_of_cdcsdk_streamed_tables

Expand Down
8 changes: 7 additions & 1 deletion docs/content/preview/reference/configuration/yb-tserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -1328,7 +1328,13 @@ Default: `102400`

The time period, in milliseconds, after which the intents will be cleaned up if there is no client polling for the change records.

Default: `14400000` (4 hours)
Default: `28800000` (8 hours)

##### --cdc_wal_retention_time_secs

WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK.

Default: `28800` (8 hours)

##### --cdcsdk_table_processing_limit_per_run

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,14 @@ CDC retains resources (such as WAL segments) that contain information related to

Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits.

Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag to control the duration for which resources are retained.
Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained.

Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification.

Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours.

{{< warning title="Important" >}}
When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows.

The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance.
{{< /warning >}}
Original file line number Diff line number Diff line change
Expand Up @@ -526,12 +526,16 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar

## Retaining data for longer durations

To increase retention of data for CDC, change the two flags, `cdc_intent_retention_ms` and `cdc_wal_retention_time_secs` as required.
The following flags are responsible for retention of data required by CDC:
- `cdc_wal_retention_time_secs` (default value: 28800s)
- `cdc_intent_retention_ms` (default value: 28800000ms)

{{< warning title="Important" >}}
Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours.

Longer values of `cdc_intent_retention_ms`, coupled with longer CDC lags (periods of downtime where the client is not requesting changes) can result in increased memory footprint in the YB-TServer and affect read performance.
{{< warning title="Important" >}}
When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows.

The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance.
{{< /warning >}}

## Content-based routing
Expand Down
2 changes: 1 addition & 1 deletion docs/content/stable/reference/configuration/yb-master.md
Original file line number Diff line number Diff line change
Expand Up @@ -893,7 +893,7 @@ Default: `0` (Use the same default number of tablets as for regular tables.)

WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK.

Default: `14400` (4 hours)
Default: `28800` (8 hours)

##### --enable_tablet_split_of_cdcsdk_streamed_tables

Expand Down
8 changes: 7 additions & 1 deletion docs/content/stable/reference/configuration/yb-tserver.md
Original file line number Diff line number Diff line change
Expand Up @@ -1336,7 +1336,13 @@ Default: `102400`

The time period, in milliseconds, after which the intents will be cleaned up if there is no client polling for the change records.

Default: `14400000` (4 hours)
Default: `28800000` (8 hours)

##### --cdc_wal_retention_time_secs

WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK.

Default: `28800` (8 hours)

##### --cdcsdk_table_processing_limit_per_run

Expand Down