Skip to content

Commit

Permalink
DBZ-7148 New JDBC sink connector batch support blog post
Browse files Browse the repository at this point in the history
  • Loading branch information
mfvitale committed Dec 14, 2023
1 parent 5f79143 commit 829eafd
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 9 deletions.
12 changes: 3 additions & 9 deletions _posts/2023-12-06-JDBC-sink-connector-batch-support.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,6 @@ CREATE TABLE `aviation` (
We planned to execute these tests:

* 100K events from single table
** Baseline without batch for (Oracle, MySQL, PostgreSQL, SQLServer)
** MySQL batch vs without batch
* 100K events from three different table
** MySQL batch vs without batch
Expand All @@ -96,19 +95,14 @@ We planned to execute these tests:
.{nbsp}
image::100k-batch-no-batch.png[role=centered-image]

_Figure 1_ illustrates the total execution time required to process 100,000 events from a single table, comparing different connectors without batch support and the MySQL connector with the default batch size.
_Figure 1_ illustrates the total execution time required to process 100,000 events from a single table, comparing MySQL connector with and without the batch support.

[NOTE]
====
Despite the default values being set to `500` for both `batch.size` and `consumer.max.poll.records`, the observed actual size was reduced to `337` records due to payload size considerations.
====

We can observe two things:

* There are difference between different connectors due to specific database technology
* As expected, the Debezium JDBC connector with batch support is faster

For the following tests we will focus on MySQL since it was the one with highest execution time without the batch support.
We can observe, as expected, that the Debezium JDBC connector with batch support is faster.

.{nbsp}
image::100k-3-tables.png[role=centered-image]
Expand Down Expand Up @@ -137,7 +131,7 @@ It's important to note that, for these tests, we used the `org.apache.kafka.conn
.{nbsp}
image::1M-different-batch-size-avro.png[role=centered-image]
We then conducted experiments with Avro, and as depicted in _Figure 5_, the results show a significant improvement.
As expected, processing 1 million events with `batch.size=500` is slower than with `batch.size=10000`.
As expected, processing 1,000,000 events with `batch.size=500` is slower than with `batch.size=10000`.
Notably, in our test configuration, the optimal value for `batch.size` is 1000, resulting in the fastest processing time.

Although the results are better compared to JSON, there is still some performance degradation.
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 829eafd

Please sign in to comment.