You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The response to a merge commit request has two modes
sync: BE adds data to the buffer, waits for the load to finish, and returns a response with the load result.
async: BE adds data to the buffer and returns a response immediately. It can not guarantee the load is successful. The client needs to poll the load state from FE (via HTTP _get_load_state) later.
The synchronous mode returns the final result to the client, simplifying subsequent processing on the client side and reducing the pressure on the FE HTTP server caused by polling load state. When used in conjunction with brpc, it does not block the client because both brpc client and server can handle requests asynchronously. The combination of sync mode and brpc is a good choice.
Currently, for sync mode, each request on the BE side polls FE periodically to obtain the load result (via thrift rpc getLoadTxnStatus). This approach is simple but has the following drawbacks, especially under high concurrency:
Requests merged into a single transaction do not share the rpc result, leading to redundant rpc calls.
The polling mechanism is inefficient: a small polling interval increases the FE rpc pressure, while a large interval reduces real-time responsiveness.
Therefore, we aim to optimize the mechanism how to get the load result on BE side.
Solution
Overall, we aim to address the issue through the following approaches:
FE Actively Sends Results to Coordinator:
After the load is completed, the FE will proactively send the result (transaction state) to the coordinator backend. This reduces unnecessary polling on the BE side and improves real-time responsiveness.
Introduce a TxnStateCache Component on the BE Side:
This component will manage transaction states and provide the following functionalities:
Caching FE-Sent Transaction States:
In real-time imports, only the most recent transactions are used. An LRUCache can store a sufficient number of transactions while automatically evicting older ones.
Subscription Mechanism:
Each load request can subscribe to the state of a transaction. Once the transaction state is updated, the request will be notified proactively.
Low-Frequency Polling for Subscribed Transactions:
For transactions with active subscriptions, the BE will poll the FE for their state at a low frequency. This addresses scenarios where FE fails to push results (e.g., FE leader switch or crash), avoiding unnecessary waiting.
The framework is as follows:
FE Side TxnStateDispatcher
responsible for sending the transaction state to the coordinator BE. After the load is completed, it will submit a dispatch task to the dispatcher, including information such as the database, transaction id, and backend id. The dispatcher will make its best effort to send the state. If the retry limit is exceeded, the task will be discarded. Additionally, the dispatcher is stateless, meaning it will not continue sending previous tasks after an FE leader switch or restart. In such cases, the BE-side polling mechanism will handle resolving the transaction state.
BRPC update_transaction_load
A new brpc interface, update_transaction_state, has been added to allow the FE to send transaction states to the BE.
BE Side TxnStateCache
DynamicCache: an lru cache whose entries are TxnStateHandler. The handler stores the transaction state, the information of subscribers (TxnStateSubscriber), and decides whether to poll state. The cache manages the lifecycle of the handler. If reach the capacity, it will evict the oldest handler on which there is no subscriber. The transaction state will be updated when receiving txn state pushed by FE or polled by TxnStatePoller, then it notifies subscribers who are waiting on it
TxnStateSubscriber: each load request subscribes the state. It holds a reference to TxnStateHandler so that the lru cache will not evict the entry before the subscriber stop waiting
TxnStatePoller: schedule and execute the txn state poll tasks. The task will update the transaction state after finished
pending poll tasks: the initial poll task for a transaction is submitted to the poller when the first TxnStateSubscriber is created, and will be scheduled periodically until the transaction reaches the final state (VISIBLE/AOBRTED/UNKNOWN) or there is no subscriber
schedule thread: a single thread that schedules pending tasks to execute according to their execution time
Execute thread pool: the thread pool that execute the poll task. The task will send thrift rpc getLoadTxnStatus to FE to get the current state, and then update TxnStateHandler
You can see the overall implementation in this draft pr #54676. It will be divided into two PRs
…55001 (#55071)
This is the second PR of merge commit sync mode optimization #54995. Introduce TxnStateDispatcher on FE side. You can see #54995 for details
Signed-off-by: PengFei Li <[email protected]>
banmoy
added a commit
to banmoy/starrocks
that referenced
this issue
Jan 15, 2025
Backgroud
The response to a merge commit request has two modes
sync
: BE adds data to the buffer, waits for the load to finish, and returns a response with the load result.async
: BE adds data to the buffer and returns a response immediately. It can not guarantee the load is successful. The client needs to poll the load state from FE (via HTTP _get_load_state) later.The synchronous mode returns the final result to the client, simplifying subsequent processing on the client side and reducing the pressure on the FE HTTP server caused by polling load state. When used in conjunction with brpc, it does not block the client because both brpc client and server can handle requests asynchronously. The combination of sync mode and brpc is a good choice.
Currently, for sync mode, each request on the BE side polls FE periodically to obtain the load result (via thrift rpc
getLoadTxnStatus
). This approach is simple but has the following drawbacks, especially under high concurrency:Therefore, we aim to optimize the mechanism how to get the load result on BE side.
Solution
Overall, we aim to address the issue through the following approaches:
FE Actively Sends Results to Coordinator:
After the load is completed, the FE will proactively send the result (transaction state) to the coordinator backend. This reduces unnecessary polling on the BE side and improves real-time responsiveness.
Introduce a
TxnStateCache
Component on the BE Side:This component will manage transaction states and provide the following functionalities:
In real-time imports, only the most recent transactions are used. An LRUCache can store a sufficient number of transactions while automatically evicting older ones.
Each load request can subscribe to the state of a transaction. Once the transaction state is updated, the request will be notified proactively.
For transactions with active subscriptions, the BE will poll the FE for their state at a low frequency. This addresses scenarios where FE fails to push results (e.g., FE leader switch or crash), avoiding unnecessary waiting.
The framework is as follows:
FE Side TxnStateDispatcher
responsible for sending the transaction state to the coordinator BE. After the load is completed, it will submit a dispatch task to the dispatcher, including information such as the database, transaction id, and backend id. The dispatcher will make its best effort to send the state. If the retry limit is exceeded, the task will be discarded. Additionally, the dispatcher is stateless, meaning it will not continue sending previous tasks after an FE leader switch or restart. In such cases, the BE-side polling mechanism will handle resolving the transaction state.
BRPC update_transaction_load
A new brpc interface,
update_transaction_state
, has been added to allow the FE to send transaction states to the BE.BE Side TxnStateCache
TxnStateHandler
. The handler stores the transaction state, the information of subscribers (TxnStateSubscriber
), and decides whether to poll state. The cache manages the lifecycle of the handler. If reach the capacity, it will evict the oldest handler on which there is no subscriber. The transaction state will be updated when receiving txn state pushed by FE or polled byTxnStatePoller
, then it notifies subscribers who are waiting on itTxnStateHandler
so that the lru cache will not evict the entry before the subscriber stop waitingTxnStateSubscriber
is created, and will be scheduled periodically until the transaction reaches the final state (VISIBLE/AOBRTED/UNKNOWN) or there is no subscribergetLoadTxnStatus
to FE to get the current state, and then updateTxnStateHandler
You can see the overall implementation in this draft pr #54676. It will be divided into two PRs
The text was updated successfully, but these errors were encountered: