[redis-rs][core] Move connection refresh to the background #2915

GilboaAWS · 2025-01-05T09:55:56Z

Issue link

This Pull Request is linked to issue (URL): [#2910]

Checklist

Before submitting the PR make sure the following are checked:

This Pull Request is related to one issue.
Commit message has a detailed description of what changed and why.
Tests are added or updated.
CHANGELOG.md and documentation files are updated.
Destination branch is correct - main or release
Create merge commit if merging release branch into main, squash otherwise.

Signed-off-by: GilboaAWS <[email protected]>

ikolomi

This is a substantial change in the core mechanics. It must include tests that fail w/o this change to prove it necessity

nihohit · 2025-01-06T10:20:35Z

instead of splitting your state between multiple maps and leaving room for error in reading the wrong map, consider just use an enum representing each connection state (Usable, Reconnecting, etc.) in the main connection map.

See

valkey-glide/glide-core/src/client/reconnecting_connection.rs

Line 43 in ffc679a

enum ConnectionState {

for an example on how to do this.

Signed-off-by: GilboaAWS <[email protected]>

barshaul

First comments (stopped before fn update_refreshed_connection).
Please see all inline comments and see how it behaves with cases where refresh connections is being called only with management/user/both. Also, maybe I haven't got to it yet - but the refresh task should be bounded to the lifetime of the clusterNode / address, so it would be cancelled when refresh_slots removes it from the topology

barshaul · 2025-01-06T11:48:12Z

glide-core/redis-rs/redis/src/cluster_async/connections_container.rs

+    // Holds all the failed addresses that started a refresh task.
+    pub(crate) refresh_addresses_started: DashSet<String>,
+    // Follow the refresh ops on the connections
+    pub(crate) refresh_operations: DashMap<String, RefreshState<Connection>>,


as we talked - instead of using RefreshState, use a ConnectionState that internally holds the operation / connection (same as is reconnectingConnection). For example -

enum ConnectionState<Connection> { Connected(ConnectionDetails<Connection>>), Reconnecting(JoinHandle<()>), } pub struct ClusterNode<Connection> { pub user_connection: ConnectionState<Connection>, pub management_connection: Option<ConnectionState<Connection>>, }

or

enum ConnectionState { Connected(ClusterNode<Connection>>), Reconnecting(handle), } connection_map: DashMap<String, ConnectionState<Connection>>,

the refresh_connections can be called only for user/management connection or for both, so you should make sure this solution covers all cases

barshaul · 2025-01-06T11:53:11Z

glide-core/redis-rs/redis/src/cluster_async/connections_container.rs

+    // Follow the refresh ops on the connections
+    pub(crate) refresh_operations: DashMap<String, RefreshState<Connection>>,
+    // Holds all the refreshed addresses that are ready to be inserted into the connection_map
+    pub(crate) refresh_addresses_done: DashSet<String>,


I don't see a good reason for using DashSet or DashMap, these structs are only going to be used from a single point without concurrency

barshaul · 2025-01-06T12:06:47Z

glide-core/redis-rs/redis/src/cluster_async/connections_container.rs

@@ -139,6 +146,13 @@ pub(crate) struct ConnectionsContainer<Connection> {
    pub(crate) slot_map: SlotMap,
    read_from_replica_strategy: ReadFromReplicaStrategy,
    topology_hash: TopologyHash,
+
+    // Holds all the failed addresses that started a refresh task.
+    pub(crate) refresh_addresses_started: DashSet<String>,


You can use the completed addresses to hold the newly created cluster node.

Instead of throwing all refresh structs directly into the connection container we might be better by wrapping all updates with some struct, so later on we'll be able to add refresh slots or other updates there too.
For example:

struct RefreshUpdates { in_progress_addresses: HashSet<String>, completed_addresses: HashMap<String, ClusterNode<Connection>>, // refreshed_slot_map: Option<SlotMap>, } impl RefreshUpdates { pub(crate) fn in_progress(&self) -> &HashSet<String> { &self.in_progress_addresses } pub(crate) fn add_refresh(&mut self, address: String) { self.in_progress_addresses.insert(address); } pub(crate) fn complete_refresh(&mut self, address: String, node: ClusterNode<Connection>) { if !self.in_progress_addresses.remove(&address) { warn!("Attempted to complete refresh for an address not in progress: {}", address); } self.completed_addresses.insert(address, node); } pub(crate) fn completed_addresses(&mut self) -> HashMap<String, ClusterNode<Connection>> { std::mem::take(&mut self.completed_addresses) } }

barshaul · 2025-01-06T12:32:29Z

glide-core/redis-rs/redis/src/cluster_async/mod.rs

-                    None
-                };
+        for address in addresses {
+            if refresh_ops_map.contains_key(&address) {


lets have an API in the connection_container for "is_reconnecting(address)".
And you can use filter and for each:

addresses .into_iter() .filter(|address| !connections_container.is_reconnecting(address)) .for_each(|address| { let handle = tokio::spawn(async move { info!("Refreshing connection task to {:?} started", address); ... }); });

barshaul · 2025-01-06T12:47:25Z

glide-core/redis-rs/redis/src/cluster_async/mod.rs

+                    "Refreshing connection task to {:?} started",
+                    address_clone_for_task
+                );
+                let _ = async {


why do you wrap it with async block again?

barshaul · 2025-01-06T13:00:10Z

glide-core/redis-rs/redis/src/cluster_async/mod.rs

        }
-        debug!("refresh connections completed");
+        debug!("refresh connection tasts initiated");


Suggested change

debug!("refresh connection tasts initiated");

debug!("refresh connection tasks initiated");

barshaul · 2025-01-06T13:01:42Z

glide-core/redis-rs/redis/src/cluster_async/mod.rs

@@ -1798,9 +1858,8 @@ where
        if !failed_connections.is_empty() {
            Self::refresh_connections(
                inner,
-                failed_connections,
+                failed_connections.into_iter().collect::<HashSet<String>>(),


it creates an unnecessary copy. instead you can change failed_connections to be a set to begin with

Signed-off-by: GilboaAWS <[email protected]>

This reverts commit 4e6535f. Signed-off-by: GilboaAWS <[email protected]>

…ils because all connection are dropped and the refresh task take longer than the request timeout. Signed-off-by: GilboaAWS <[email protected]>

…eturned the refresh_connection logic of sending the connection but without removing it from the connection_map Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS requested a review from a team as a code owner January 5, 2025 09:55

GilboaAWS requested review from barshaul and ikolomi January 5, 2025 10:04

GilboaAWS added 2 commits January 5, 2025 10:06

[redis-rs][core] Move connection refresh to the background

bf2cbe2

Signed-off-by: GilboaAWS <[email protected]>

formating using lint

f850d99

Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS force-pushed the reconnect_to_bg branch from 31cf02f to f850d99 Compare January 5, 2025 10:06

ikolomi requested changes Jan 5, 2025

View reviewed changes

Fixing bug - Refresh addresses started to take place

30b35f2

Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS force-pushed the reconnect_to_bg branch from ffbd018 to 30b35f2 Compare January 6, 2025 11:16

barshaul reviewed Jan 6, 2025

View reviewed changes

Bug fix

b2caf01

Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS force-pushed the reconnect_to_bg branch from 01a0dbd to b2caf01 Compare January 6, 2025 16:40

Fix update_passwd test in python

4e6535f

Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS force-pushed the reconnect_to_bg branch from 40744cb to 4e6535f Compare January 7, 2025 09:54

GilboaAWS added 3 commits January 7, 2025 09:57

Clear refresh data on reconnect to initial nodes

2d93b4a

Signed-off-by: GilboaAWS <[email protected]>

Revert "Fix update_passwd test in python"

d7ff41e

This reverts commit 4e6535f. Signed-off-by: GilboaAWS <[email protected]>

refresh slot after reconnect to initials as a update_password test fa…

bf867cc

…ils because all connection are dropped and the refresh task take longer than the request timeout. Signed-off-by: GilboaAWS <[email protected]>

GilboaAWS marked this pull request as draft January 7, 2025 11:48

GilboaAWS self-assigned this Jan 7, 2025

1. Replaced DashMap to HashMap 2. Moved refresh data to a struct 3. R…

a12837a

…eturned the refresh_connection logic of sending the connection but without removing it from the connection_map Signed-off-by: GilboaAWS <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[redis-rs][core] Move connection refresh to the background #2915

[redis-rs][core] Move connection refresh to the background #2915

GilboaAWS commented Jan 5, 2025

ikolomi left a comment

nihohit commented Jan 6, 2025 •

edited

Loading

barshaul left a comment •

edited

Loading

barshaul Jan 6, 2025

barshaul Jan 6, 2025

barshaul Jan 6, 2025 •

edited

Loading

barshaul Jan 6, 2025

barshaul Jan 6, 2025 •

edited

Loading

barshaul Jan 6, 2025

barshaul Jan 6, 2025

	debug!("refresh connection tasts initiated");
	debug!("refresh connection tasks initiated");

[redis-rs][core] Move connection refresh to the background #2915

Are you sure you want to change the base?

[redis-rs][core] Move connection refresh to the background #2915

Conversation

GilboaAWS commented Jan 5, 2025

Issue link

Checklist

ikolomi left a comment

Choose a reason for hiding this comment

nihohit commented Jan 6, 2025 • edited Loading

barshaul left a comment • edited Loading

Choose a reason for hiding this comment

barshaul Jan 6, 2025

Choose a reason for hiding this comment

barshaul Jan 6, 2025

Choose a reason for hiding this comment

barshaul Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

barshaul Jan 6, 2025

Choose a reason for hiding this comment

barshaul Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

barshaul Jan 6, 2025

Choose a reason for hiding this comment

barshaul Jan 6, 2025

Choose a reason for hiding this comment

nihohit commented Jan 6, 2025 •

edited

Loading

barshaul left a comment •

edited

Loading

barshaul Jan 6, 2025 •

edited

Loading

barshaul Jan 6, 2025 •

edited

Loading