You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a remote server was invited to a room and joins the room, they may not receive some messages if those are sent while the server processes the send_join. These events are only later backfilled when the local server sends a new message. This can be quite confusing, since you usually sent some messages and are waiting for a response, but the other side will not have all the messages, so might not respond yet.
I sadly can't include any logs, since those include customer data, but the relevant information seems to be this:
B1 is the delayed event. It isn't sent out, since at the time the remote server is only invited, not joined. However since the remote server is already processing the join, it only fetches the prev events for B2 and is unaware that B1 exists. Later federation transactions from local to remote don't pick up B1, because the streamid is smaller, I think. That is until a new event is created (C in our case), which still won't send out B1, even though it is a prev_event, but does trigger backfill from the remote server (since it references B1 and B2).
The sequence of requests around the send_join are:
Received PUT to /_matrix/client/v3/rooms/!room/send/m.room.encrypted/txnid-1 (event B1) [worker d1]
Signing event B1 [worker d1]
Event auth allowing B1 [worker d1]
Start persist_events for B1 [worker d1]
Received make_join from remote server and make_join returns 200 to remote server [worker 0b]
persist_events TXN START and END, outliers get updated [worker d1]
(federation transmission loop finishes sending some presence events to remote [worker b6])
Received request to send_join (event B2) [worker d1]
persist_events TXN START and END for MultiWriterIdGenerator._update_table [worker d1]
Keyring fetch for remote server key [worker d1]
Return 200 for PUT to /_matrix/client/v3/rooms/!room/send/m.room.encrypted/txnid-1 [worker d1]
Remote key fetch done [worker d1]
Verify content hash of B2, on_send_membership_event [worker d1]
calculate state groups for B2 [worker d1]
soft fail checks for B2 [worker d1]
State resolution (1 conflicting entry) [worker d1]
Might drop extremities decides not to drop B1 [worker d1]
Persist outliers B2 [worker d1]
Return 200 to remote for send_join [worker 0b]
(federation transmission loop sends some EDUs to remote [worker b6])
Remote requests state ids for A, the event A and backfills from A [worker 0b]
There are also a lot of EDUs sent all the time to the remote server, which probably affects this issue somewhat, since it will likely also mark a streamid as processed?
Steps to reproduce
have a local room
invite a remote user
have the remote user join exactly when you are sending an event
The remote server won't receive that event until you send another event later
Homeserver
multiple
Synapse Version
1.85.2 (local), 1.107 (remote) (sadly it was an old server, where I managed to track it down at the right time, but it happens at low frequency on a lot of our servers)
Installation Method
Other (please mention below)
Database
postgres, no separate servers
Workers
Multiple workers
Platform
Docker, custom image with a few extra modules
Configuration
A few modules to validate invites, but those weren't executed in the relevant part of the issue.
Relevant log output
I can't provide those at the moment since they include customer data I am not allowed to share, but I included the relevant information in the description, which I hope is sufficient.
Anything else that would be useful to know?
No response
The text was updated successfully, but these errors were encountered:
Description
When a remote server was invited to a room and joins the room, they may not receive some messages if those are sent while the server processes the send_join. These events are only later backfilled when the local server sends a new message. This can be quite confusing, since you usually sent some messages and are waiting for a response, but the other side will not have all the messages, so might not respond yet.
I sadly can't include any logs, since those include customer data, but the relevant information seems to be this:
We have the following events:
B1 is the delayed event. It isn't sent out, since at the time the remote server is only invited, not joined. However since the remote server is already processing the join, it only fetches the prev events for B2 and is unaware that B1 exists. Later federation transactions from local to remote don't pick up B1, because the streamid is smaller, I think. That is until a new event is created (C in our case), which still won't send out B1, even though it is a prev_event, but does trigger backfill from the remote server (since it references B1 and B2).
The sequence of requests around the send_join are:
make_join
from remote server andmake_join
returns 200 to remote server [worker 0b]send_join
(event B2) [worker d1]send_join
[worker 0b]There are also a lot of EDUs sent all the time to the remote server, which probably affects this issue somewhat, since it will likely also mark a streamid as processed?
Steps to reproduce
Homeserver
multiple
Synapse Version
1.85.2 (local), 1.107 (remote) (sadly it was an old server, where I managed to track it down at the right time, but it happens at low frequency on a lot of our servers)
Installation Method
Other (please mention below)
Database
postgres, no separate servers
Workers
Multiple workers
Platform
Docker, custom image with a few extra modules
Configuration
A few modules to validate invites, but those weren't executed in the relevant part of the issue.
Relevant log output
I can't provide those at the moment since they include customer data I am not allowed to share, but I included the relevant information in the description, which I hope is sufficient.
Anything else that would be useful to know?
No response
The text was updated successfully, but these errors were encountered: