Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Media in the content repo is not authed #870

Closed
kethinov opened this issue Apr 22, 2017 · 57 comments · Fixed by #1858
Closed

Media in the content repo is not authed #870

kethinov opened this issue Apr 22, 2017 · 57 comments · Fixed by #1858
Labels
A-Media-Repository feature Suggestion for a significant extension which needs considerable consideration security

Comments

@kethinov
Copy link

Example, this was shared in a private 3 person chat, but anyone can view it: https://matrix.org/_matrix/media/v1/download/matrix.org/bSRWdHBFqtVzowZDhwRGbzDq

Most people I've recruited into Matrix are Google Hangouts refugees looking for an open platform. On Hangouts, you cannot view the web URL of an image in this way unless you're authenticated with the server and the user has shared it with you in a chat.

Would it be possible to support moving past security through obscurity at some point? Or, failing that, at least expire the images after a week or so?

This is concerning because it would be rather trivial for someone to write a simple app querying random alphanumeric strings to harvest images people have shared in private conversations.

@Autre31415

This comment has been minimized.

@mphara8437
Copy link

mphara8437 commented Jul 19, 2017

This is no more security through obscurity than any other key based authentication mechanism, this is called URL based authentication, the key in your example bSRWdHBFqtVzowZDhwRGbzDq is 24 characters long and uses upper and lower case, this is 52^24 which is more than 128bits.

But lets work through your concern.

Imagine we write your trivial app and start it running...

Assuming the CDN can store 1 PB (PetaByte), and an average image size of 1KB, thats a trillion images (10^12 or 1,000,000,000,000).

Lets assume that you have a really high speed Internet link and the CDN will let you do 10^8 (100,000,000) queries per second, tcpdump says that a single query is 4.7KB so were doing 470GB of traffic every second, and apparently both your link and the server are able to handle 3760Gbps.

Lets say that no one notices that the server is getting hit by a Denial of Service attack 6 times larger than anything ever seen before, and they let you keep going for 10 years (60*60*24*365*10)

52^24/10^12/10^8/(60*60*24*365*10)

At this point we can determine that you have a 1 in 4,844,775,310,744 chance of getting a random Cat pic...

Meanwhile you have a better chance of getting struck by lightning... while drowning at 1 in 183 million.

Personally I would be more concerned about someone walking up to the server and stealing it... or the server gets hacked due to a bug somewhere... which is why you should be using encrypted chat...

This is what an image looks like when it is sent to a group using encrypted chat:

https://matrix.org/_matrix/media/v1/download/matrix.org/qctIqdoPymLbqdNpOkWZGtvo

If you grab this file (which was a jpeg of a cat) you will notice that it it encrypted.

@kethinov
Copy link
Author

It's still less secure than Hangouts et al though because it only requires correctly guessing one key rather than two or more.

To access a privately shared image via Hangouts, you'd have to gain access to a whole account that has been granted permission to view the image, so you'd have to know both the username and the password, which is much harder to randomly guess.

Moreover, some accounts are configured with 2FA, further increasing the security.

This implementation is far from that, and I think addressing this would be worth doing at some point.

@mphara8437
Copy link

My understanding of your concern was that the media-id's which were being generated by Synapse, left users of Synapse open to a brute-force keyspace attack using a simple app (an understandable concern).

The Matrix specification does not provide details on media-id keyspace, so the keyspace for the media-id can be easily increased to increase security without issue, if required.

However a keyspace attack against the Synapse content repository API implementation is already infeasible, so no change is necessary.

Synapse is the reference implementation for the Matrix specification and adding user authentication to the content repository API would require a change to the Matrix Specification.

To propose changes to the Matrix Specifications see the following:

https://github.com/matrix-org/matrix-doc/blob/master/CONTRIBUTING.rst

PS If you are concerned about privacy, use encryption.

@kethinov
Copy link
Author

Encryption is nice, but if I have your file, I could deploy infinite time and resources to brute force that encryption. What you want is to make it as hard as possible for me to get your file in the first place, then encrypt it on top of that. That's why people are so reticent to hand over their phones or laptops to border patrol even when they use full disk encryption. Physical security matters perhaps even more than encryption.

As such, what concerns me here is it's so easy to gain physical access (in a sense) to random people's files by stumbling on a random file just by guessing a single key, rather than having to match at least two matching pairs. In other image sharing services, there are similar long, unique keys to access the image itself, but in addition to that you need to present valid account credentials and that account has to have been given explicit permission to view that image.

I do think it would be prudent add those additional layers of security here.

@taurhine
Copy link

I totally agree with kethinov. I can imagine deploying fail2ban on the server to monitor 404 errors would slow down the attacker but still does not solve the main issue.

@uhoreg
Copy link
Member

uhoreg commented Oct 16, 2017

dup of matrix-org/synapse#1403

@richvdh richvdh changed the title Images shared inline are not private Media in the content repo is not authed Oct 16, 2017
@richvdh
Copy link
Member

richvdh commented Oct 16, 2017

See also https://github.com/matrix-org/matrix-doc/issues/701 for the spec issue here.

@benqrn
Copy link

benqrn commented Oct 16, 2017

It is highly unlikely someone could guess the media url, the key in each media link is reasonably long enough to prevent guessing. The more likely attack vector would be obtaining the URL directly somehow; perhaps it is accidentally posted into a channel or someone who already has the link shares it without permission, your browser has a toolbar that is scraping your URL entries without your knowledge, some other person in the channel has malware on their machine that is sending away data it is collecting from a channel they are participating in with you, etc.

@turt2live
Copy link
Member

Crossposting for the purposes of visibility (source):

I don't think this has been answered somewhere, so asking here in hopes people have ideas: How would federated media work?

In theory the server could start signing requests to download media, although that doesn't really guarantee that the person making the request is allowed to do so (ie: is in the room). With the upcoming introduction of users being linked to key-like objects, we could possibly use those to sign the requests, however there's nothing to stop a server lying about which user is requesting the media.

Then there's the question of the user potentially wanting specific media being publicly accessible. The primary use case being the IRC bridge which pastebins long messages.

@ara4n
Copy link
Member

ara4n commented Jun 4, 2018

So this comes up on a regular basis, especially from corporate security folks who don't like the idea that a URL leaked in HTTP logs (or proxy logs) etc could then be simply curl'd by any random user to access the content. It's not a matter of the chances of guessing the URL correctly (or the chances of being hit by lightning) but instead whether an attacker who does manage to get the URL automagically gets access to the content too.

One thing we could do is to auth access to the content itself, but this means tracking the event(s) that the content is referenced by and in turn which users have access to those events and so can view the content. This is a potentially nasty leak of metadata for e2e attachments which we don't currently have otherwise. (It's possible we might need this for quotas as per matrix-org/synapse#3339, but hopefully not). It's also quite heavy for the media repo to have to check auth rules for a room for every piece of content that is viewed (and is a bit unfortunate if the media repo is otherwise independent of the room server).

An alternative naive solution could be to just track a random bearer token alongside each mxc:// URL for each piece of content, stored in the event and in the repo. Clients would then submit this bearer token as Authorization: Bearer <secret> whenever they query the repo, meaning that URLs can't be simply copy-pasted around the place unless the auth token is also provided. This might be enough, in practice?

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Edit: we could of course also mandate that the user has a valid access_token for the server too when they are accessing the media repo, although that doesn't lock access to any particular piece of content.

@ara4n
Copy link
Member

ara4n commented Jun 4, 2018

@turt2live did you have any ideas on how this should/could work?

@turt2live
Copy link
Member

Not too much beyond the verbose spiel above (which ends with "I have no idea"). In any case, we should consider having a way for users/bridges/bots to say "this is supposed to be unauthed" via the API for things like the IRC bridge.

How insane would it be to always end to end encrypt media regardless of room?

@turt2live
Copy link
Member

on second thought, encrypting everything doesn't really help. The authorization token probably makes the most sense, although I'm curious as to how the HMAC stuff would work.

@uhoreg
Copy link
Member

uhoreg commented Jun 4, 2018

For bridges, I suspect that users will end up having to request the file using a URL from the bridge, and the bridge would have to do the auth dance. Maybe we could add an endpoint that will return a time-limited download URL that the bridge can 302 the user to, so that it won't have to proxy the whole file. But this would allow to check that the original event hasn't been redacted.

@MurzNN
Copy link
Contributor

MurzNN commented Jun 5, 2018

Maybe investigate how this done in Hangouts?

@ara4n
Copy link
Member

ara4n commented Jun 5, 2018

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public. (In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

@erikjohnston
Copy link
Member

It's worth noting that we probably want to support being able open media in a separate window, e.g. to view large images or PDFs etc, and I don't think you can make the browser add auth headers in those cases

@ara4n
Copy link
Member

ara4n commented Jun 5, 2018

there are ways of fixing that - e.g. have the client download the content itself with the right headers and then expose it to the user as a blob URL, which can then be viewed in separate windows/tabs etc.

@ara4n
Copy link
Member

ara4n commented Jun 5, 2018

I think there was also another solution involving HMACs (which I think is how we did it pre-Matrix?), but I can't remember how that worked. @erikjohnston any idea?

Turns out that the way we used to do it was to never send access_tokens in requests at all, but send an HMAC(method, url, access_token) and then use the access_token as a shared secret, so that a leaked URL wouldn't leak an individual user's access_token. I assume we didn't do this for Matrix because calculating that HMAC would be too onerous for trivial HTTP clients, hence passing raw access_tokens around. In practice it doesn't buy us anything in this instance, as the resulting URL could still be passed blindly around anyway; we might as well create a new random secret for each URL and use that instead.

@richvdh
Copy link
Member

richvdh commented Jun 5, 2018

(cf https://github.com/matrix-org/matrix-doc/issues/1043 for "access tokens suck")

@user318
Copy link

user318 commented Jun 5, 2018

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id. When accessing media, the server could check that access token is correct for the user and the user is authenticated.

@uhoreg
Copy link
Member

uhoreg commented Jun 5, 2018

In reply to @ara4n:

alternatively, when the bridge could deliberately expose the URL with a ?secret=... querystring rather than an Auth header if it's intended to be accessible by the general public.

The reason that I suggested having the Bridge do the auth dance, rather than forwarding the secret in the querystring was so that a file that's redacted Matrix-side would become inacessible to bridged users.

(In addition, we /could/ track whether a given MXC should be world-readable or not in the media repo DB, or whether it should require an access_token for access (in addition to the secret))

I would just say that a file can be uploaded with a token or without a token. If it's uploaded with a token, then downloads need to be authed; if it's uploaded without a token, then it's a free-for-all.

In reply to @user318

What if each user would get its unique link to media or may be a common link with personal auth token, based on his id.

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

@user318
Copy link

user318 commented Jun 5, 2018

That doesn't really work with end-to-end encrypted files, as the server doesn't get to see what file IDs are visible to what users.

I do not actually know how it works in e2e. I thought that files are embedded there as a base64-encoded message. And not stored as media.

@uhoreg
Copy link
Member

uhoreg commented Jun 6, 2018

Messages have a size limit, so you can't store files within the message itself. You also don't want to send the whole file to everyone until they request it. e2e file events are basically just pointers to an encrypted blob in the media store, along with the decryption key.

@ara4n
Copy link
Member

ara4n commented Jun 7, 2018

I've written a spec proposal for solving this over at https://github.com/matrix-org/matrix-doc/issues/701, review welcome on the googledoc.

@dr1
Copy link

dr1 commented Jun 7, 2018

Is matrix-org/synapse#1263 going to be taken care of with this change as well? I'm only seeing concerns of GDPR erasure, which I presume mean when someone deactivates and deletes their account. Right now its fairly easy to have a tragedy if an inappropriate attachment link gos out a bridge.

@cuongnv

This comment has been minimized.

@nunoperalta
Copy link

Reading this thread, it appears most people mentioned brute force attacks or someone providing the URL to other people.

What I'm really concerned of is if somehow Google or other Search Engines end up indexing these images, because they are, after all, public URLs.

If someone posts the URL in public (like the OP of this thread), the image may potentially become indexed.

This Issue is an important one that needs to be resolved, especially on a project that takes Encryption and Privacy with high priority :)

@vince2010091

This comment has been minimized.

@clokep
Copy link
Member

clokep commented Jul 29, 2021

I'm going to move this to the matrix-doc repo since this would need to be specced before synapse can implement anything.

@clokep clokep transferred this issue from matrix-org/synapse Jul 29, 2021
@clokep
Copy link
Member

clokep commented Jul 29, 2021

And now that we've transferred it it seems that matrix-org/matrix-spec-proposals#3796 is the duplicate for this.

@clokep clokep closed this as completed Jul 29, 2021
@davralin
Copy link

Avatars are concerned too (https:///_matrix/media/r0/thumbnail//xxxxxxxxxxxxxx?width=400&height=400&method=crop
If they can be cats pictures, they also can be personnal data like a face

For a soft that claim privacy and security, this is weird to not authenticate or require access token for such requests.

A conversation in a public space is still public, even if the conversation is between three people.

If the conversation should be secret, or the participants always wants privacy, they choose to encrypt all the communication.

That renders the mediaURL useless, as all you can get from the link is an encrypted blob - as pointed out in the linked cat-picture.

In many ways the conclusions is simple:

  • Is the conversation and all attachments public of nature? Don't encrypt.
  • Is it not? End-to-end-encryption.

There's no reason to trust the servers implementation (or lack thereof) of anything if there's E2EE involved anyway...

@richvdh
Copy link
Member

richvdh commented Dec 24, 2021

And now that we've transferred it it seems that matrix-org/matrix-spec-proposals#3796 is the duplicate for this.

matrix-org/matrix-spec-proposals#3796 is a proposal to fix it; this is the canonical issue.

@richvdh richvdh reopened this Dec 24, 2021
@richvdh richvdh transferred this issue from matrix-org/matrix-spec-proposals Mar 2, 2022
@turt2live turt2live added feature Suggestion for a significant extension which needs considerable consideration security A-Media-Repository labels May 31, 2022
@n0toose
Copy link

n0toose commented Aug 18, 2022

The assertions in this thread seem to assume that, and, please correct me if I am wrong:

  • assumes a lot about the computational resources available to a party interested in obtaining files/"every computer belonging to a single person is equal".
  • does not account for the fact that a range of (26*2)^24 gets increasingly more "insecure" the bigger a homeserver gets in regards to how many files are uploaded to it and how many files are generally available to it.
  • does not account for the theoretical chance that such passwords will be much easier to crack in the future.
  • assumes that non-end-to-end encrypted files should be readable to anyone other than the participants in the conversation, the recipients of the link to a file, and the server operators, which is the expectation that Matrix/Element itself sets towards the user.

This "too negligible for most people to actually communicate it properly" approach is personally making me feel uneasy, even if it were possibly more likely for me to get struck by lighting, considering that there are opportunities (in the future) to actually bring the chance of anyone ever receiving anything down to an absolute zero.

@FlyveHest
Copy link

A comment on this, as far as I can see, this will break media being shared across bridges, unless these bridges relay binary data directly.

But in turn, this will defeat the purpose of protecting the media since it will be directy available on another platform, maybe without the original poster being aware of this.

@Iruwen
Copy link

Iruwen commented Sep 7, 2023

Correct me if I'm wrong, but: this makes Matrix a great filesharing host. Just create an anonymous account and an unencrypted non-public room and upload whatever you want in chunks as big as the server allows, then let the world know about the URLs to be consumed by tools like JDownloader. With some more effort on the client side, having public access to encrypted chunks is even more perfidious. And the server operator is probably liable for any illegal content (hello DCMA takedown or worse).

@axelsimon
Copy link

@Iruwen In most (many?) legal regimes, you are only liable for things you know to be hosting, and become liable once you've been informed of the case (and often, the material must also be "manifestly illegal" or similar). Simply having something "bad" on your server doesn't automatically make you liable.

A lot of large services (such as Youtube) will automatically take something down as soon as they are notified that it could be problematic, because that's when their legal liability starts. But most of the time they don't care to check whether it is actually problematic, especially for copyright matters & fair use/dealing (hence DMCA takedown requests being weaponised).

@rltas
Copy link

rltas commented Oct 20, 2023

Until this is resolved, I added a Lua script in my nginx reverse proxy which only allows media access for ip addresses that successfully accessed the /capabilities or /sync endpoints, which seem to be two authenticated ones that are reliably accessed first.

@immanuelfodor
Copy link

Could you please share the config, how can one achieve this? (It'd be great for me if it would be a full example, I mean with the Docker commands as well if possible, AFAIK nginx doesn't contain the Lua engine anymore, so I need to do something to have Lua besides nginx)

@rltas
Copy link

rltas commented Oct 20, 2023

I can do that later, yeah. In my case it's integrated with https://github.com/spantaleev/matrix-docker-ansible-deploy and thus involves Traefik as well, but it should be easy to adapt.

@richvdh
Copy link
Member

richvdh commented Oct 20, 2023

Until this is resolved, I added a Lua script in my nginx reverse proxy which only allows media access for ip addresses that successfully accessed the /capabilities or /sync endpoints, which seem to be two authenticated ones that are reliably accessed first.

Be aware that this will break federation: it will mean that users on other servers will be unable to view media uploaded on your server.

@rltas
Copy link

rltas commented Oct 20, 2023

Yeah I'm not federating, thanks for pointing that out. I guess if you're looking for some extra privacy without aiming for the obvious solution that is encryption, you'll have a specific reason for that tradeoff.

@n0toose
Copy link

n0toose commented Jun 13, 2024

@turt2live thanks!

@kethinov
Copy link
Author

@turt2live the original link in my original post can still be viewed without authentication. Has this change gone live yet on the matrix.org homeserver?

And will it apply to all previous media, or only to new media shared after the change goes live?

@turt2live
Copy link
Member

The matrix.org homeserver's rollout is being worked out following the spec change - there should be more detail in a few weeks (watch the matrix.org blog for updates).

The spec change does not add authentication to existing endpoints, but rather introduces new ones. Servers are being advised to freeze the unauthenticated endpoints, like the one linked above, rather than add authentication retroactively. Media from before the freeze will remain accessible on the old endpoints while new media will only be accessible on the new endpoints. This is what matrix.org plans to do as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Media-Repository feature Suggestion for a significant extension which needs considerable consideration security
Projects
None yet
Development

Successfully merging a pull request may close this issue.