diff --git a/_posts/2024-01-16-Debezium-and-TimescaleDB.adoc b/_posts/2024-01-16-Debezium-and-TimescaleDB.adoc index 780c442906..a8c66c5e90 100644 --- a/_posts/2024-01-16-Debezium-and-TimescaleDB.adoc +++ b/_posts/2024-01-16-Debezium-and-TimescaleDB.adoc @@ -22,7 +22,7 @@ TimescaleDB provides three basic building blocks/concepts: Metadata (catalog) that describes the definitions of the instances and the raw data are typically stored in `_timescaledb_internal_schema`. link:https://debezium.io/documentation/reference/stable/transformations/timescaledb.html[TimescaleDb SMT] connects to the database and reads and processes the metadata. -Based on them the raw messages read from the database are enriched with the metadata stored in Kafka Connect headers that create the relation between physical data and the TimescaleDB logical constructs. +The raw messages read from the database are then enriched with the metadata stored in Kafka Connect headers, creating the relation between the physical data and the TimescaleDB logical constructs. == Demonstration @@ -50,7 +50,7 @@ In the next step it is necessary to register the Debezium PostgreSQL connector t $ curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://localhost:8083/connectors/ -d @register-timescaledb.yaml ---- -The registration request file differs from the regular one with the addition of lines +The registration request file differs from the regular one with the addition of these lines [source, json] ---- @@ -72,9 +72,9 @@ The registration request file differs from the regular one with the addition of === Hypertables -The connector will capture the internal TimescaleDB schema with the physical tables containing the raw data and the `TimescaleDb` SMT will be applied to enrich messages and route them to correctly named topics based on the logical names. +The connector will capture the internal TimescaleDB schema with the physical tables containing the raw data and the `TimescaleDb` SMT will be applied to enrich messages and route them to the correctly named topics based on the logical names. The SMT configuration options contain information needed to connect to the database. -In this case, the `conditions` hypertable will be physically stored in `_timescaledb_internal._hyper_1_1_chunk` and when processed by the SMT it will be re-routed to `timescaledb.public.conditions` topic that is named according to fixed configured prefix `timescaledb` and logical name `public.conditions` that conforms to the hypertable name. +In this case, the `conditions` hypertable will be physically stored in `_timescaledb_internal._hyper_1_1_chunk` and when processed by the SMT, it will be re-routed to `timescaledb.public.conditions` topic that is named according to fixed configured prefix `timescaledb` and logical name `public.conditions` that conforms to the hypertable name. Let's add a few more measurements to the table @@ -189,8 +189,8 @@ So the topic contains two or more messages calculated for two different location === Compression -The TimescaleDB SMT does not enhance compressed chunks of data (physical table records), only as a by-product of them being stored in hypertable. -The compressed data are captured and stored in the Kafka topic. +The TimescaleDB SMT does not enhance compressed chunks of data (physical table records), only as a by-product of them being stored in a hypertable. +The compressed data is captured and stored in the Kafka topic. Typically, messages with compressed chunks are dropped and are not processed by subsequent jobs in the pipeline. Let's enable compression for the hypertable and compress it @@ -216,5 +216,5 @@ docker-compose -f docker-compose-timescaledb.yaml down ---- == Conclusion -In this post, we have demonstrated the capturing of data from TimescaleDB time-series database and their processing by TimescaleDb SMT. +In this post, we have demonstrated the capturing of data from TimescaleDB time-series database and their processing by the TimescaleDb SMT. We have shown how messages are routed and enriched depending on hypertables and continuous aggregates acting as the source of data.