-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC4083: Delta-compressed E2EE file transfers #4083
base: main
Are you sure you want to change the base?
Conversation
A rough proposal for delta-compressing file transfers, originally written for Third Room, but apparently i never committed it at the time.
|
||
`GET /_matrix/media/v3/download/matrix/org/n3wv3rs10n?delta_base=mxc://matrix.org/b4s3v3rs10n` | ||
|
||
This would return an ordered multipart download of the deltas (once unencrypted, if needed) to apply to the base-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For encrypted files, how do clients discover the encryption keys for each delta and the base file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just realised the same thing :) i guess this pushes it back towards putting the delta links on the m.file events rather than the content repository, and using aggregations perhaps as a way to grab all the events needed to download a given file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, could be evil and specify the same IV & Key for every event which is a diff on a given file - but calculate the actual IV used to encrypt/decrypt the diff as IV' = H(IV, $content_id)
. This would mean that diffs have to be created as async uploads so you know their content_id before they can be encrypted by the client though; and the multipart download would have to include content IDs.
I'm not convinced this is better than using an aggregation API to say "give me all the events for the diffs needed to construct this $event_id", and then firing off a tonne of parallel reqs to the media repo to grab the required media files (which is arguably only 2 roundtrips too). But it avoids having to fiddle around with events at all.
|
||
* `delta_base` is the mxc URL of the content the delta applies to | ||
* `delta_format` is the file format of the binary diff | ||
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped RFC3284 compatible binary VCDIFF payloads, picked for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped RFC3284 compatible binary VCDIFF payloads, picked for | |
* This MSC defines `m.vcdiff.v1.gzip` to describe gzipped [RFC3284](https://datatracker.ietf.org/doc/html/rfc3284) compatible binary VCDIFF payloads, picked for |
computation efficiency rather than patch size (whereas bsdiff + bzip might provide better patch size at worse | ||
computation complexity; other MSCs are welcome to propose different diff formats). | ||
|
||
Clients should upload a new snapshot of a piece of content if the sum of the deltas relative to the last snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clients must also upload a new snapshot when needed to ensure that secrecy is preserved in encrypted rooms. e.g. if a new user joins, a new snapshot must be uploaded, otherwise the new user would need to be able to decrypt the file state from before they joined the room.
file, and then want to express a small change to it (e.g. using the editor to transform part of the scene graph). Or | ||
you might want to store a change to a markdown or HTML file. | ||
|
||
Currently, your only option is to save a whole new copy of the file - or invent your own delta-compression scheme at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't you also use an existing delta format like the one used by OTA updates on android and encrypt that separately here? Or is the concern that due to e2ee shenanigans, intermediate deltas are lost here? (I am not saying that this a good approach. Just an alternative that also came to mind for me)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but this proposal does propose using an existing delta format (vcdiff - rfc3284) and encrypting the diffs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must have been asleep while writing this comment I guess 😱 sorry.
A rough proposal for delta-compressing file transfers, originally written for Third Room, but apparently i never committed it at the time - so submitting it as a MSC for posterity, in the hope that it saves some time in future next time someone wants to do incremental binary updates against a file in Matrix. (@hughns: should we ever get back to the Matrix Files SDK, this might be of interest)
Rendered