refactor(batch-exports): Use async producer in Redshift export #25872
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem
Redshift is notorious for being a very inefficient destination that can be very slow. This usually means we end up retrying a lot, due to connections to ClickHouse being dropped due to lack of progress on the insert side.
Similar to the work done for BigQuery, we implement async production of events for Redshift batch export. This gives us a buffer to store events, to ensure we can continue using the connection even if Redshift is being too slow.
Moreover, this PR also adds support for heartbeating in Redshift to ensure we can resume when a batch export fails, thus guaranteeing we will eventually finish.
Changes
👉 Stay up-to-date with PostHog coding conventions for a smoother review.
Does this work well for both Cloud and self-hosted?
How did you test this code?
Ran all existing Redshift tests. All passed.