Replace Synapse antispam module with a "rule server" counterpart #78

turt2live · 2020-12-24T18:54:29Z

Fixes #37

Synapse reviewer: I'm not expecting review on the JS/TS bits, just the Python parts to make sure I haven't violated every security principle on the planet. Due to the diff, it might make sense to review it from the raw view rather than the line view.

Note: this completely replaces the existing Synapse antispam module. Admins are encouraged to upgrade upon release, though will have to make significant changes to how their configuration works.

Fixes #37 **Note**: this completely replaces the existing Synapse antispam module. Admins are encouraged to upgrade upon release, though will have to make significant changes to how their configuration works.

dkasak

Generally LGTM, but see comments.

dkasak · 2020-12-28T10:04:01Z

README.md

+
+First, run the Docker image for the rule server. This is what will be serving the generated
+Python for the Synapse antispam module to read from. This rule server will serve the Python
+off a webserver at `/api/v1/py_rules` which must be accessible by wherever Synapse is installed.


From my testing, the rules server appears to serve the rules from any endpoint, not just /api/v1/py_rules.

yea, that's technically a bug but the documentation should be what people expect to work.

dkasak · 2020-12-28T10:18:21Z

synapse_antispam/mjolnir/antispam.py

+                # HACK: Private member access (_hs)
+                resp = await self._api._hs.get_proxied_http_client().get_json(self._config['rules_url'])
+
+                # *** !! DANGER !! ***


The Python module looks generally sane and I don't see anything obviously wrong.

From a defence in depth perspective, I do have a general unease around using eval on code received from a server. I understand this is coming from a trusted server, but it does make it easier for an attacker to escalate and move laterally in case he somehow manages to take control of the rules server.

Is there a particular reason why the rules need to be Python code instead of a small (e.g. JSON-based) DSL that is only able to encode the primitives that will reasonably be used for constructing ban expressions?

I agree with @dkasak that this knowingly leaves open a potential insecure vector. Note that the potentially generated rules (e.g. resp["checks"][...]["search"]) seems to be one of:

event.get(..., '')

user_profile[...]

A direct variable (e.g. room_id or user_id).

UserID.from_string(user_id).domain

Some potential alternatives:

Have the spam checker call the server for each query and handle a simple yes/no response. 🤷 This would only work for Synapse v1.25.0 where spam checker methods can be async and might have serious performance implications.

Since the generated expressions are generally accessing variables or dictionary keys, have the server return a dotted dictionary accessor (e.g. user_id or user_profile.user_id and do some magic with splitting on ., something like:

accessor = "user_profile.user_id" # this would be returned by the server value = {"user_profile": user_profile} for key in accessor.split("."): value = getattr(value, key, "") if re.search(check["pattern"], value): return True

Note that something special would need to be done to get the user's domain, but that could be handled fairly easily by doing something like the following for the initial value:

value = { "user_id": user_id, "user_domain": UserID.from_string(user_id).domain, }

The accessor would then be user_id or user_domain.

In general this approach seems overly flexible for the rules that get generated in RuleServer.ts, but maybe there's a future plan I don't know about. Even my example above seems like it could be simplified quite a bit by knowing the data returned for each rule and what it gets applied against, which would also make the code a bit less abstract.

Note that if the dotted accessor approach is taken, the accessor keys should still be checked against a whitelist instead of allowing everything (which brings us back to a DSL). If a fully general accessor is allowed, it might be possible for an attacker to navigate the object to a secret string and then leak it by successively adapting a rule and checking whether it triggers.

clokep · 2020-12-28T13:55:34Z

config/default.yaml

+      rooms: false # rooms don't have usernames and can't be blocked.
+      servers: false # the only rule which would apply is one for the local server.


Why even have these in the default config if they don't apply?

aside from the interface being easier to copy/paste and understand, mjolnir doesn't protect people from making mistakes almost by design - if someone really wanted to turn this switch on due to a massive attack of some sort, they could.

clokep · 2020-12-28T13:56:59Z

synapse_antispam/mjolnir/antispam.py

+        # HACK: Private member access (_hs)
+        api._hs.get_clock().looping_call(self._update_rules, 5 * 1000)


Exposing the clock as part of the module API would probably be reasonable, although that would quite widen the API surface we would have to consider stable.

clokep · 2020-12-28T13:58:19Z

synapse_antispam/mjolnir/antispam.py

+        # HACK: Private member access (_hs)
+        api._hs.get_clock().looping_call(self._update_rules, 5 * 1000)
+
+        # These are all arrays of compile()'d code from the ban list server.


Compiled code...that does what though? Are these lists of callables?

clokep · 2020-12-28T14:00:55Z

synapse_antispam/mjolnir/antispam.py

+                # HACK: Private member access (_hs)
+                resp = await self._api._hs.get_proxied_http_client().get_json(self._config['rules_url'])


The ModuleApi object has an HTTP client available. It is not the proxied HTTP client, which should be fine though since rules_url is likely internal and doesn't need a proxy?

ah ha, this isn't documented :p

clokep · 2020-12-28T14:05:37Z

synapse_antispam/mjolnir/antispam.py

+
+        return defer.ensureDeferred(run())
+
+    def _compile_rules(self, rules):


It would help if the inputs and outputs of these methods were documented a bit. It seems that rules is a dictionary with a two keys:

search: Python code as a string. It seems to be expected to be an expression that has access to some locally defined variables, which differ depending on the check being done.

pattern: A string regular expression.

A couple of potential improvements:

Use an attrs class instead of a dictionary for the returned lists (this should use less memory and allow for more efficient access to properties).

Pre-compile the regular expressions using re.compile and then call check["pattern"].search(search).

clokep · 2020-12-28T14:18:12Z

synapse_antispam/mjolnir/antispam.py

+        for check in self._code_spam_checks:
+            params = {
+                "event": event,
+                "UserID": UserID,


Passing in a class here seems a bit scary to me, it might allow an attacker to modify the behavior of that class throughout Synapse via monkeypatching. (I suppose this is kind of true if you allow access to anything that isn't a primitive though.)

clokep · 2020-12-28T14:21:59Z

synapse_antispam/mjolnir/antispam.py

+                # HACK: Private member access (_hs)
+                resp = await self._api._hs.get_proxied_http_client().get_json(self._config['rules_url'])
+
+                # *** !! DANGER !! ***


I agree with @dkasak that this knowingly leaves open a potential insecure vector. Note that the potentially generated rules (e.g. resp["checks"][...]["search"]) seems to be one of:

event.get(..., '')

user_profile[...]

A direct variable (e.g. room_id or user_id).

UserID.from_string(user_id).domain

Some potential alternatives:

Have the spam checker call the server for each query and handle a simple yes/no response. 🤷 This would only work for Synapse v1.25.0 where spam checker methods can be async and might have serious performance implications.

Since the generated expressions are generally accessing variables or dictionary keys, have the server return a dotted dictionary accessor (e.g. user_id or user_profile.user_id and do some magic with splitting on ., something like:

accessor = "user_profile.user_id" # this would be returned by the server value = {"user_profile": user_profile} for key in accessor.split("."): value = getattr(value, key, "") if re.search(check["pattern"], value): return True

Note that something special would need to be done to get the user's domain, but that could be handled fairly easily by doing something like the following for the initial value:

value = { "user_id": user_id, "user_domain": UserID.from_string(user_id).domain, }

The accessor would then be user_id or user_domain.

In general this approach seems overly flexible for the rules that get generated in RuleServer.ts, but maybe there's a future plan I don't know about. Even my example above seems like it could be simplified quite a bit by knowing the data returned for each rule and what it gets applied against, which would also make the code a bit less abstract.

clokep · 2020-12-28T14:38:39Z

synapse_antispam/mjolnir/antispam.py

+                "UserID": UserID,
+            }
+            search = eval(check["search"], {}, params)
+            if re.search(check["pattern"], search):


Should we ensure that search is a string?

clokep · 2020-12-28T14:40:29Z

synapse_antispam/mjolnir/antispam.py

+                "event": event,
+                "UserID": UserID,
+            }
+            search = eval(check["search"], {}, params)


This can throw if the received code doesn't make sense with the input parameters. Right now this will raise an exception, is that the proper behavior?

Replace Synapse antispam module with a "rule server" counterpart

90640b6

Fixes #37 **Note**: this completely replaces the existing Synapse antispam module. Admins are encouraged to upgrade upon release, though will have to make significant changes to how their configuration works.

turt2live requested a review from a team December 24, 2020 18:54

Appease the linter

f616ecc

dkasak reviewed Dec 28, 2020

View reviewed changes

clokep reviewed Dec 28, 2020

View reviewed changes

Yoric mentioned this pull request Apr 16, 2021

[WIP] Defining a DSL to send rules to the spam checker #95

Closed

Yoric closed this Aug 17, 2021

Gnuxie mentioned this pull request Dec 9, 2021

Gnuxie/ruleserver #166

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Synapse antispam module with a "rule server" counterpart #78

Replace Synapse antispam module with a "rule server" counterpart #78

turt2live commented Dec 24, 2020

dkasak left a comment

dkasak Dec 28, 2020

turt2live Dec 28, 2020

dkasak Dec 28, 2020

clokep Dec 28, 2020

dkasak Dec 28, 2020

clokep Dec 28, 2020

turt2live Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

turt2live Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

clokep Dec 28, 2020

		rooms: false # rooms don't have usernames and can't be blocked.
		servers: false # the only rule which would apply is one for the local server.

		# HACK: Private member access (_hs)
		api._hs.get_clock().looping_call(self._update_rules, 5 * 1000)

		# HACK: Private member access (_hs)
		resp = await self._api._hs.get_proxied_http_client().get_json(self._config['rules_url'])


		return defer.ensureDeferred(run())

		def _compile_rules(self, rules):

Replace Synapse antispam module with a "rule server" counterpart #78

Replace Synapse antispam module with a "rule server" counterpart #78

Conversation

turt2live commented Dec 24, 2020

dkasak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment