-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC4074: Server side annotation aggregation #4074
base: main
Are you sure you want to change the base?
Changes from 7 commits
19024b4
8e7c5fa
dc97cf1
9b98437
45a3654
8d5ebfb
70f5b66
83e7d0f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
# MSC4074: Server side annotations aggregation | ||
Currently, the specification for [`m.annotation aggregation`](https://spec.matrix.org/v1.8/client-server-api/#server-side-aggregation-of-mannotation-relationships) says that such relations should not be aggregated | ||
on the server side. | ||
|
||
This requires servers to deliver clients all annotations they | ||
receive. In most cases clients need just a number of annotations of | ||
every type. Therefore, delivering aggregated annotations instead of | ||
single events could dramatically reduce server-client traffic in | ||
rooms with many reactions. | ||
|
||
## Background | ||
|
||
Aggregation is a process when client and/or server "summarises" | ||
together many events of the same type. | ||
|
||
In early specification versions there was only client-side | ||
aggregation and client was solely responsible for receiving and | ||
summarizing all events to make a compact "view" of them. | ||
|
||
Later server side aggregation was introduced with idea to: | ||
- decrease client-server traffic for cases when clients do not need | ||
all events of the same type, but can know only their count | ||
- deliver summary of associated children events together with their | ||
parent event given that they can be significantly separated in | ||
timeline | ||
|
||
Given 1000 users reacted with "👍" annotation to event A, | ||
it is normally sufficient for clients to know that there were 1000 | ||
reactions of that type "👍" to event A. | ||
|
||
If clients need to get the exact list of users who reacted and | ||
additional information about these 1000 reactions, clients can use | ||
existing [`relations (relationships) API`](https://spec.matrix.org/v1.8/client-server-api/#relationships-api) | ||
|
||
Servers tried to provide clients with server side generated | ||
annotation aggregates before, but these attempts were not successful | ||
mainly because both sides (server and client) tried to | ||
aggregate and the same time. That used to lead to the situation when | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and what at the same time? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. corrected |
||
given the potentially outdated aggregate, clients could not | ||
understand which events had already been included into the server | ||
provided aggregate and which - not. | ||
|
||
## Proposal | ||
|
||
Clients will not be responsible for aggregation of relations of | ||
"m.annotation" relation type anymore. Such relations will always be | ||
aggregated by the server. The only exception to this rule is E2E | ||
encrypted events, which should be solely aggregated by the client. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the encryption condition only apply once reactions are encrypted? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rephrased to make it clearer |
||
|
||
When there is a mix of encrypted and non-encrypted events, client | ||
should merge its encrypted annotations aggregate with the unencrypted | ||
provided by the server. Given the two sets of events (encrypted, | ||
unencrypted) do not overlap, no events will be counted twice. | ||
|
||
Client API will always deliver annotations aggregates, including both | ||
local and known federation events. Homeserver will be responsible | ||
for counting annotations correctly. | ||
|
||
Whenever requested, server will always provide up-to-date (or nearly | ||
up-to-date) aggregates of "m.annotation" via [`messages`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomidmessages), | ||
[`relations`](https://spec.matrix.org/v1.8/client-server-api/#relationships-api), [`get event by id`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomideventeventid) and | ||
similar client api endpoints. | ||
|
||
For server-server API, homeserver should provide aggregates, | ||
including only local events (events created and signed by self). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not following this bit. Servers distribute the events as-is without consideration for the underlying feature, meaning each server should be able to determine how many of each event there is. Reactions can also be used as a form of "DAG healing" when rooms get complex. This is because they are normally higher traffic than regular messages, giving more opportunity for the DAG to heal extremities. The delay associated with needing to process these complicated event trees is also (typically) less critical than message traffic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. extended the comment to clarify the idea: severs do not filter aggregated relations (no issue with the problem you described), but attach aggregates with counters to parent events they deliver (e.g servers A and B federate room R. server A users liked event E 100 times. server B users liked event E 20 times. when server A delivers E event PDU, it comes with counter 100, server B knows that there are additional 20 'local' likes and can deduct/merge 100+20 = 120.); If this point seems arguable, we can exclude the paragraph completely |
||
|
||
Servers are allowed to limit maximum number of different "m.annotation" | ||
keys they aggregate to a single event by a reasonable value (that | ||
corresponds to a limited number of different reaction kinds, e.g. "👍", | ||
"👎"). This is needed to avoid the situation when malicious users may | ||
attack server creating jumbo-events (events with arbitrary high count of | ||
different reaction kinds). Servers may configure this threshold on their | ||
discretion, but this number should not be lower than 16. When annotation | ||
key limit is reached, no new keys should be added to the aggregate, but | ||
older keys would continue to be aggregated (change their count). | ||
Annotations beyond that threshold should still be available via | ||
[`relations`](https://spec.matrix.org/v1.8/client-server-api/#relationships-api) endpoints. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would a server communicate that it hit the limit? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I completely removed this clause. Potentially this problem already exists now, but only on the client side. I think it would better get addressed separately in order not to overcomplicate this change proposal (already big enough and technically complex) |
||
|
||
## Aggregate formats | ||
|
||
Full aggregate format (with parent ClientEvent/PDU): | ||
``` | ||
{ | ||
"event_id": "$my_event", | ||
... | ||
"unsigned": { | ||
"m.relations": { | ||
"m.annotation": [ | ||
{ | ||
"key": "👍", | ||
"origin_server_ts": 1562763768320, | ||
"count": 3, | ||
"current_user_annotation_event_id": "$bar", // optional field | ||
}, | ||
{ | ||
"key": "👎", | ||
"origin_server_ts": 1562763768320, | ||
"count": 1, | ||
"current_user_annotation_event_id": "$foo", // optional field | ||
} | ||
], | ||
... | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Partial aggregate format (EDU): | ||
``` | ||
{ | ||
"type": "m.reaction", | ||
"content": { | ||
"m.relates_to": { | ||
"rel_type": "m.annotation", | ||
"event_id": "$parent_event_id", | ||
"key": "👍" | ||
"current_user_annotation_event_id": "$foo", // optional field | ||
"origin_server_ts": 1562763768320, | ||
} | ||
}, | ||
"unsigned": { | ||
"annotation_count": 1234, | ||
} | ||
} | ||
``` | ||
|
||
## Aggregate updates | ||
|
||
All client api endpoints but [`/sync`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3sync) should deliver events | ||
enriched with updated aggregates whenever requested (as part of their | ||
regular response) | ||
|
||
[`/sync`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3sync) should behave the following | ||
way: | ||
- when initial sync (sync without position) is queried, events should | ||
contain up-to-date aggregates | ||
- when sync with a token is called and there are no aggregate updates | ||
"prior" to the token, new events (after the token) with up-to-date | ||
aggregates should be delivered | ||
- given the current time Tn, when sync with a token T is called and there | ||
are aggregate updates of event E with stream position T-1 happened | ||
between T and Tn, sync should a. re-deliver event E with updated | ||
aggregates OR b. deliver EDU of the particular annotation key updated | ||
with the up-to-date count | ||
- given the same as above, but with limited sync response, aggregate | ||
updates should be sent to client with higher priorities than the | ||
rest of the sync response (normal timeline events) | ||
- whenever receiving an event with full "m.annotation" aggregate, clients | ||
should overwrite their annotation aggregate they have at the time of | ||
receiving. whenever receiving a partial (EDU) aggregate update for a | ||
certain annotation key, clients should overwrite (replace) aggregate | ||
state for that particular annotation key. | ||
- if there are no other sync updates than aggregate updates at the time | ||
of sync request, aggregate updates should not trigger immediate sync | ||
response | ||
- given the client is waiting for updates on long polling, aggregate | ||
updates should not interrupt long polling. aggregate updates should only | ||
be included into sync response, whenever long polling timeout is reached | ||
|
||
## Filtering out aggregated annotations | ||
|
||
`RoomEventsFilter` format should be extended to include new filter param | ||
`filter_server_aggregated_relation_types` ([]string). | ||
Server should filter out events having relation types to their parents | ||
specified in this array if these events are aggregated by the server. | ||
Filtering should work for the main and thread timelines. | ||
Such filtered events should only be delivered in their full form | ||
whenever requested explicitly via client API ([`relations`](https://spec.matrix.org/v1.8/client-server-api/#relationships-api), | ||
[`get event by id`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomideventeventid),[`context`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomidcontexteventid)) or via server-server API | ||
(federation API) | ||
|
||
|
||
## Benefits of the proposal | ||
|
||
- Room timeline would be cleared from many barely meaningful events in | ||
hot rooms | ||
- Traffic would potentially be saved on mobile devices | ||
- Client devices would not need to have such a big local storage (event | ||
ids + some metadata for every annotation) | ||
- When closing "gaps" for limited [`/sync`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3sync) response and paginating | ||
over [`/messages`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomidmessages), clients | ||
would not need to load many annotations page-by-page which only end up | ||
showing one number somewhere (e.g. iterating many pages of content just | ||
to show "👍: 1000") | ||
- Pagination "forward" using [`/messages`](https://spec.matrix.org/v1.8/client-server-api/#get_matrixclientv3roomsroomidmessages) would | ||
provide correct annotation count together with every event (no need to | ||
go to the end of the timeline to aggregate everything to show correct | ||
annotation count for a certain event). This is a solution of the problem | ||
that given page cannot potentially contain all reactions to a given | ||
event, which happened in the "future" (would belong to the "next" pages | ||
otherwise). | ||
|
||
Although E2E encrypted annotations are still aggregated by the client | ||
in this proposal, massive scale public rooms will unlikely have | ||
strict security requirements and reactions could be left unencrypted | ||
there. Smaller rooms in their turn where stricter security might be | ||
needed can still use E2E encrypted annotations, but performance and | ||
scalability will not be a concern in this case. | ||
|
||
An advantage of the proposed approach, among others, is that clients | ||
can decide what security levels they want. | ||
|
||
## Potential issues | ||
|
||
1. Older client SDK used a custom not described in the specification | ||
aggregate format introduced in Synapse implementation with "chunked" | ||
annotation aggregates. This aggregate format is incompatible with the | ||
proposed format. | ||
2. Existing client behavior is client-side aggregation of "m.annotation" | ||
relations. | ||
|
||
In order not to silently break clients with the new server side | ||
aggregation, new annotation filtering behaviour should be explicitly | ||
requested by clients via the added | ||
`filter_server_aggregated_relation_types` filtering param. | ||
This filtering param can later be reused for the same purpose | ||
to hide other server aggregated events as soon as more relation type | ||
aggregates are supported. | ||
|
||
## Alternatives | ||
|
||
1. Not to aggregate annotations on the server side (as of now) | ||
This limits room scalability for large rooms, where people potentially | ||
react more frequently than produce content. There are also use case | ||
scenarios like read-only rooms with many users, where users can read, | ||
react on events created by room admins (e.g.) | ||
2. Use the former "chunked" format, which was previously used by synapse | ||
and later obsoleted. This format provides extra "prev" and "next" | ||
tokens, "chunk" of annotations and largely duplicates new "relations" | ||
endpoints which had not existed when the format was introduced. With the | ||
current specification version this extra functionality seems redundant | ||
and would just overcomplicate server side aggregation implementation. | ||
Whenever clients need, they can always iterate over annotations | ||
explicitly requesting /relations endpoints to get non-aggregates view of | ||
relations they are interested in. | ||
|
||
### Client opt-in | ||
|
||
The proposed change is fully backwards compatible. Clients supporting the | ||
change will be able to opt-in and pass | ||
`filter_server_aggregated_relation_types` param via `RoomEventsFilter` | ||
|
||
## Security considerations | ||
|
||
Server cannot aggregate E2E encrypted annotations. In order to make | ||
annotation aggregation work in E2E rooms, such annotations should | ||
be sent unencrypted. | ||
Annotations do not normally contain security sensitive data, and this | ||
limitation should not be significant for most of the cases. | ||
|
||
In order to provide a workaround for cases when stricter security is | ||
important, encrypted annotations should be aggregated by the client. | ||
|
||
To make this process work, server would not filter out encrypted | ||
annotations from the main and thread timelines by default and deliver | ||
them to clients. Clients aggregate only encrypted annotations and | ||
apply their aggregate on top (in addition) of the aggregate server | ||
may already provide to event. Such aggregates would never overlap | ||
as server never aggregates encrypted events and simple deterministic | ||
logic to merge server and client aggregates exists in this case. | ||
|
||
## Unstable prefix | ||
|
||
No new identifiers are proposed; it is proposed that servers implementing | ||
this | ||
proposal simply do so on the existing endpoints. | ||
Comment on lines
+334
to
+336
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. prefixed with |
||
|
||
## Dependencies | ||
|
||
None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation requirements:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expectation here?
?
Or is it just a ref comment for other contributors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The third option, a working client/server implementation somewhere. MSCs must be proven with implementations before they can be accepted: https://spec.matrix.org/proposals/#implementing-a-proposal
It generally doesn't matter who implements it or where, so implementation requirements are commented as a reference for everyone