From 47c71dc5482d53417d107aeb7c2658aa76ec58b6 Mon Sep 17 00:00:00 2001 From: Siddharth Shah Date: Mon, 13 Jan 2025 21:52:14 +0530 Subject: [PATCH 1/8] doc changes --- .../advanced-configuration.md | 10 +++++++++- .../cdc-get-started.md | 10 +++++++--- .../preview/reference/configuration/yb-master.md | 2 +- .../preview/reference/configuration/yb-tserver.md | 8 +++++++- .../advanced-configuration.md | 10 +++++++++- .../cdc-get-started.md | 10 +++++++--- .../stable/reference/configuration/yb-master.md | 2 +- .../stable/reference/configuration/yb-tserver.md | 8 +++++++- 8 files changed, 48 insertions(+), 12 deletions(-) diff --git a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 6809e8850412..831640235014 100644 --- a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -30,6 +30,14 @@ CDC retains resources (such as WAL segments) that contain information related to Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits. -Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag to control the duration for which resources are retained. +Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained. Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification. + +Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. + +{{< warning title="Important" >}} +When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. + +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +{{< /warning >}} diff --git a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index 6942824e333c..d8415d8f0c11 100644 --- a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -535,12 +535,16 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar ## Retaining data for longer durations -To increase retention of data for CDC, change the two flags, `cdc_intent_retention_ms` and `cdc_wal_retention_time_secs` as required. +The following flags are responsible for retention of data required by CDC: +- `cdc_wal_retention_time_secs` (default value: 28800s) +- `cdc_intent_retention_ms` (default value: 28800000ms) -{{< warning title="Important" >}} +Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. -Longer values of `cdc_intent_retention_ms`, coupled with longer CDC lags (periods of downtime where the client is not requesting changes) can result in increased memory footprint in the YB-TServer and affect read performance. +{{< warning title="Important" >}} +When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. {{< /warning >}} ## Content-based routing diff --git a/docs/content/preview/reference/configuration/yb-master.md b/docs/content/preview/reference/configuration/yb-master.md index c0c9975224cc..0ce836f20154 100644 --- a/docs/content/preview/reference/configuration/yb-master.md +++ b/docs/content/preview/reference/configuration/yb-master.md @@ -925,7 +925,7 @@ Default: `0` (Use the same default number of tablets as for regular tables.) WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK. -Default: `14400` (4 hours) +Default: `28800` (8 hours) ##### --enable_tablet_split_of_cdcsdk_streamed_tables diff --git a/docs/content/preview/reference/configuration/yb-tserver.md b/docs/content/preview/reference/configuration/yb-tserver.md index d2a66bccb4d5..bb2dfb0a94ff 100644 --- a/docs/content/preview/reference/configuration/yb-tserver.md +++ b/docs/content/preview/reference/configuration/yb-tserver.md @@ -1330,7 +1330,13 @@ Default: `102400` The time period, in milliseconds, after which the intents will be cleaned up if there is no client polling for the change records. -Default: `14400000` (4 hours) +Default: `28800000` (8 hours) + +##### --cdc_wal_retention_time_secs + +WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK. + +Default: `28800` (8 hours) ##### --cdcsdk_table_processing_limit_per_run diff --git a/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 4b34d5255a78..353fc4612bfe 100644 --- a/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -28,6 +28,14 @@ CDC retains resources (such as WAL segments) that contain information related to Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits. -Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag to control the duration for which resources are retained. +Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained. Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification. + +Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. + +{{< warning title="Important" >}} +When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. + +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +{{< /warning >}} diff --git a/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index 822f5ec7d24c..1512dabdb5ca 100644 --- a/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -532,12 +532,16 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar ## Retaining data for longer durations -To increase retention of data for CDC, change the two flags, `cdc_intent_retention_ms` and `cdc_wal_retention_time_secs` as required. +The following flags are responsible for retention of data required by CDC: +- `cdc_wal_retention_time_secs` (default value: 28800s) +- `cdc_intent_retention_ms` (default value: 28800000ms) -{{< warning title="Important" >}} +Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. -Longer values of `cdc_intent_retention_ms`, coupled with longer CDC lags (periods of downtime where the client is not requesting changes) can result in increased memory footprint in the YB-TServer and affect read performance. +{{< warning title="Important" >}} +When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. {{< /warning >}} ## Content-based routing diff --git a/docs/content/stable/reference/configuration/yb-master.md b/docs/content/stable/reference/configuration/yb-master.md index 055ca5c7e1e4..f006b45381cc 100644 --- a/docs/content/stable/reference/configuration/yb-master.md +++ b/docs/content/stable/reference/configuration/yb-master.md @@ -933,7 +933,7 @@ Default: `0` (Use the same default number of tablets as for regular tables.) WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK. -Default: `14400` (4 hours) +Default: `28800` (8 hours) ##### --enable_tablet_split_of_cdcsdk_streamed_tables diff --git a/docs/content/stable/reference/configuration/yb-tserver.md b/docs/content/stable/reference/configuration/yb-tserver.md index 9010fe055046..86c034f408cd 100644 --- a/docs/content/stable/reference/configuration/yb-tserver.md +++ b/docs/content/stable/reference/configuration/yb-tserver.md @@ -1338,7 +1338,13 @@ Default: `102400` The time period, in milliseconds, after which the intents will be cleaned up if there is no client polling for the change records. -Default: `14400000` (4 hours) +Default: `28800000` (8 hours) + +##### --cdc_wal_retention_time_secs + +WAL retention time, in seconds, to be used for tables for which a CDC stream was created. Used in both xCluster and CDCSDK. + +Default: `28800` (8 hours) ##### --cdcsdk_table_processing_limit_per_run From f8ca29ec91e941ccd0a166778a6c598161d61b62 Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:00:07 +0530 Subject: [PATCH 2/8] Update docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-logical-replication/advanced-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 831640235014..0d896a7cbc92 100644 --- a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -30,7 +30,7 @@ CDC retains resources (such as WAL segments) that contain information related to Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits. -Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained. +Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) and [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc-wal-retention-time-secs) flags to control the duration for which resources are retained. Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification. From 337334f05c97d86a0756ea99eefdc91c68369e73 Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:00:29 +0530 Subject: [PATCH 3/8] Update docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-logical-replication/advanced-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 0d896a7cbc92..a173406f46f9 100644 --- a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -34,7 +34,7 @@ Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification. -Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. +Starting from v2024.2.1, the default data retention for CDC is 8 hours, with support for maximum retention up to 24 hours. Prior to v2024.2.1, the default retention for CDC is 4 hours. {{< warning title="Important" >}} When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. From 1f9e8bb331bd3ae8ff0a1150b2667815d4255131 Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:02:38 +0530 Subject: [PATCH 4/8] Update docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-logical-replication/advanced-configuration.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md index a173406f46f9..083487fc2d09 100644 --- a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -39,5 +39,5 @@ Starting from v2024.2.1, the default data retention for CDC is 8 hours, with sup {{< warning title="Important" >}} When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. -The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period (default 8 hours). Be aware that any interruption in CDC consumption for extended periods using these replica identities may degrade read performance. This happens because compaction activities are halted in the database when these replica identities are used, leading to inefficient key lookups as reads must traverse multiple SST files. {{< /warning >}} From 11b434891d2c9b310c28e12fb2341cdce8016638 Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:02:47 +0530 Subject: [PATCH 5/8] Update docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-yugabytedb-grpc-replication/cdc-get-started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index d8415d8f0c11..d423378ddd56 100644 --- a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -533,7 +533,7 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar - [cdc_max_stream_intent_records](../../../../reference/configuration/yb-tserver/#cdc-max-stream-intent-records) - Controls how many intent records can be streamed in a single `GetChanges` call. Essentially, intents of large transactions are broken down into batches of size equal to this flag, hence this controls how many batches of `GetChanges` calls are needed to stream the entire large transaction. The default value of this flag is 1680, and transactions with intents less than this value are streamed in a single batch. The value of this flag can be increased, if the workload has larger transactions and CDC throughput needs to be increased. Note that high values of this flag can increase the latency of each `GetChanges` call. -## Retaining data for longer durations +## Retain data for longer durations The following flags are responsible for retention of data required by CDC: - `cdc_wal_retention_time_secs` (default value: 28800s) From 686332283b3d683f079b58671f5872874b1b5af8 Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:03:02 +0530 Subject: [PATCH 6/8] Update docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-yugabytedb-grpc-replication/cdc-get-started.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index d423378ddd56..482bd45d6d7b 100644 --- a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -535,7 +535,8 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar ## Retain data for longer durations -The following flags are responsible for retention of data required by CDC: +The following flags control the retention of data required by CDC: + - `cdc_wal_retention_time_secs` (default value: 28800s) - `cdc_intent_retention_ms` (default value: 28800000ms) From 1d5d4c77591c2223b14b3ddcfb00cf15dd408b9b Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:03:22 +0530 Subject: [PATCH 7/8] Update docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../using-yugabytedb-grpc-replication/cdc-get-started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index 482bd45d6d7b..93890f2401a9 100644 --- a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -537,7 +537,7 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar The following flags control the retention of data required by CDC: -- `cdc_wal_retention_time_secs` (default value: 28800s) +- `cdc_wal_retention_time_secs` (default 28800s) - `cdc_intent_retention_ms` (default value: 28800000ms) Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. From 68696650f100d3ed6ba84066634939c4bbd00a6c Mon Sep 17 00:00:00 2001 From: siddharth2411 <43139012+siddharth2411@users.noreply.github.com> Date: Wed, 15 Jan 2025 16:05:23 +0530 Subject: [PATCH 8/8] Apply suggestions from code review Co-authored-by: Dwight Hodge <79169168+ddhodge@users.noreply.github.com> --- .../advanced-configuration.md | 2 +- .../cdc-get-started.md | 8 ++++---- .../advanced-configuration.md | 8 ++++---- .../cdc-get-started.md | 13 +++++++------ 4 files changed, 16 insertions(+), 15 deletions(-) diff --git a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 083487fc2d09..725f63b8ec70 100644 --- a/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/preview/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -37,7 +37,7 @@ Resources are retained for each tablet of a table that is part of a database who Starting from v2024.2.1, the default data retention for CDC is 8 hours, with support for maximum retention up to 24 hours. Prior to v2024.2.1, the default retention for CDC is 4 hours. {{< warning title="Important" >}} -When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. +When using FULL or DEFAULT replica identities, CDC preserves previous row values for UPDATE and DELETE operations. This is done by retaining history for each row in the database through a suspension of the compaction process. Compaction is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period (default 8 hours). Be aware that any interruption in CDC consumption for extended periods using these replica identities may degrade read performance. This happens because compaction activities are halted in the database when these replica identities are used, leading to inefficient key lookups as reads must traverse multiple SST files. {{< /warning >}} diff --git a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index 93890f2401a9..dfb33d25287a 100644 --- a/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/preview/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -538,14 +538,14 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar The following flags control the retention of data required by CDC: - `cdc_wal_retention_time_secs` (default 28800s) -- `cdc_intent_retention_ms` (default value: 28800000ms) +- `cdc_intent_retention_ms` (default 28800000ms) -Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. +Starting from v2024.2.1, the default data retention for CDC is 8 hours, with support for maximum retention up to 24 hours. Prior to v2024.2.1, the default retention for CDC is 4 hours. {{< warning title="Important" >}} -When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. +When using ALL, FULL_ROW_NEW_IMAGE, or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES before image modes, CDC preserves previous row values for UPDATE and DELETE operations. This is done by retaining history for each row in the database through a suspension of the compaction process. Compaction is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. -The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period (default 8 hours). Be aware that any interruption in CDC consumption for extended periods using these before image modes may degrade read performance. This happens because compaction activities are halted in the database when these before image modes are used, leading to inefficient key lookups as reads must traverse multiple SST files. {{< /warning >}} ## Content-based routing diff --git a/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md b/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md index 353fc4612bfe..a89ecc3142dc 100644 --- a/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md +++ b/docs/content/stable/develop/change-data-capture/using-logical-replication/advanced-configuration.md @@ -28,14 +28,14 @@ CDC retains resources (such as WAL segments) that contain information related to Retaining resources has an impact on the system. Clients are expected to consume these transactions within configurable duration limits. Resources will be released if the duration exceeds these configured limits. -Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) & [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc_wal_retention_time_secs) flag to control the duration for which resources are retained. +Use the [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) and [cdc_wal_retention_time_secs](../../../../reference/configuration/yb-tserver/#cdc-wal-retention-time-secs) flags to control the duration for which resources are retained. Resources are retained for each tablet of a table that is part of a database whose changes are being consumed using a replication slot. This includes those tables that may not be currently part of the publication specification. -Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. +Starting from v2024.2.1, the default data retention for CDC is 8 hours, with support for maximum retention up to 24 hours. Prior to v2024.2.1, the default retention for CDC is 4 hours. {{< warning title="Important" >}} -When using replica identity FULL or DEFAULT, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. +When using FULL or DEFAULT replica identities, CDC preserves previous row values for UPDATE and DELETE operations. This is done by retaining history for each row in the database through a suspension of the compaction process. Compaction is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of history for streamed rows. -The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned replica identities may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period (default 8 hours). Be aware that any interruption in CDC consumption for extended periods using these replica identities may degrade read performance. This happens because compaction activities are halted in the database with these replica identities, leading to inefficient key lookups as reads must traverse multiple SST files. {{< /warning >}} diff --git a/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md b/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md index 1512dabdb5ca..1a15cf9becad 100644 --- a/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md +++ b/docs/content/stable/develop/change-data-capture/using-yugabytedb-grpc-replication/cdc-get-started.md @@ -530,18 +530,19 @@ You can use several flags to fine-tune YugabyteDB's CDC behavior. These flags ar - [cdc_max_stream_intent_records](../../../../reference/configuration/yb-tserver/#cdc-max-stream-intent-records) - Controls how many intent records can be streamed in a single `GetChanges` call. Essentially, intents of large transactions are broken down into batches of size equal to this flag, hence this controls how many batches of `GetChanges` calls are needed to stream the entire large transaction. The default value of this flag is 1680, and transactions with intents less than this value are streamed in a single batch. The value of this flag can be increased, if the workload has larger transactions and CDC throughput needs to be increased. Note that high values of this flag can increase the latency of each `GetChanges` call. -## Retaining data for longer durations +## Retain data for longer durations -The following flags are responsible for retention of data required by CDC: -- `cdc_wal_retention_time_secs` (default value: 28800s) -- `cdc_intent_retention_ms` (default value: 28800000ms) +The following flags control the retention of data required by CDC: + +- `cdc_wal_retention_time_secs` (default 28800s) +- `cdc_intent_retention_ms` (default 28800000ms) Starting from 2024.2.1, the data retention configuration for Change Data Capture (CDC) has been updated. The default retention period is now set to 8 hours, with support for maximum retention up to 24 hours. Prior to 2024.2.1, the default retention for CDC is 4 hours. {{< warning title="Important" >}} -When using before image modes ALL, FULL_ROW_NEW_IMAGE or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES, CDC preserves previous row values for UPDATE and DELETE operations. This is accomplished by retaining history for each row in the database through a suspension of the compaction process. Compaction process is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. +When using ALL, FULL_ROW_NEW_IMAGE, or MODIFIED_COLUMNS_OLD_AND_NEW_IMAGES before image modes, CDC preserves previous row values for UPDATE and DELETE operations. This is done by retaining history for each row in the database through a suspension of the compaction process. Compaction is halted by setting retention barriers to prevent cleanup of history for those rows that are yet to be streamed to the CDC client. These retention barriers are dynamically managed and advanced only after the CDC events are streamed and explicitly acknowledged by the client, thus allowing compaction of streamed rows. -The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period, defaulting to 8 hours. Users should be aware that any interruption in CDC consumption for extended periods with the above-mentioned before image modes may lead to potential read performance degradation. This happens because compaction activities are halted in the database with these before image modes, leading to inefficient key lookups as reads must traverse multiple SST files, which degrades read performance. +The [cdc_intent_retention_ms](../../../../reference/configuration/yb-tserver/#cdc-intent-retention-ms) flag governs the maximum retention period (default 8 hours). Be aware that any interruption in CDC consumption for extended periods using these before image modes may degrade read performance. This happens because compaction activities are halted in the database when these before image modes are used, leading to inefficient key lookups as reads must traverse multiple SST files. {{< /warning >}} ## Content-based routing