-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: add vetted properties #15
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if feasibly we can vet functions/methods in the libraries. I think vetting the library would be enough? We could have unsafe parts of a library, but I'm not sure if we want to include all methods and classes from the libraries we trust here.
Also, I think another useful thing would be to be able to get a list of those libraries which are used/imported by the pickle file, for downstream analysis.
It would also be useful to add this to the command line arguments of the CLI tool.
fickling/pickle.py
Outdated
if self._vetted_dependencies is None or self._vetted_calls is None: | ||
return not self.has_import and not self.has_non_setstate_call | ||
else: | ||
return not self.has_unvetted_import and not self.has_unvetted_calls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't is_likely_safe
be the and
of all 4 conditions when applicable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.has_import
and self.has_unvetted_import
as well as self.has_non_setstate_call
and self.has_unvetted_calls
would clash, given that the latter is more permissive than the former.
I understand this is cumbersome. I'm not sure how to link a function call to its module. I'm currently looking in the
To achieve this : from fickling.pickle import Pickled
import pickle
pkl = b'cbuiltins\neval\n(Vprint("hello")\ntR.'
p = Pickled.load(pkl)
imports = [imp.module for imp in p.properties.imports]
print(imports) Do you think it's worth creating a function that returns |
Fair enough, no need to add it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% convinced yet that any imports are safe. For example, numpy
is not, in general. At a minimum, I think fickling
should print out a warning if any vetted imports or calls are provided. Something like, "Warning: This pickle file imports external module [X] which was added to the vetted module list." We should probably also add a command-line option to specify them. I'm happy to help out with that.
example/vetted.py
Outdated
|
||
arr = np.array([1,2,3]) | ||
fickled = Pickled.load(pickle.dumps(arr)) | ||
fickled.vetted_dependencies = ["numpy"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy
itself is not safe. For example: https://numpy.org/doc/stable/reference/generated/numpy.distutils.exec_command.html
How are vetted_dependencies
and vetted_calls
intended to interact? Are ndarray
and dtype
the only calls within numpy
that are allowed? This should be documented somewhere in the README. I'm happy to help with that.
I'd be happy with that.
Fair point. But I'm not sure if vetting is in the scope of |
As @adrinjalali said, the idea is indeed to let the user add a list of libraries and function calls within it as "vetted". I also do not think fickling should vet any particular library, this should be at the discretion of users (though you could have a "community vetted librairies" example or document, but that doesn't seem safe to me). The only problem I see with the current implementation is that there is no link between function calls and the module they're from, e.g.
That was my idea, yes. I think this way you avoid allowing numpy calls that execute arbitrary code as you showed. |
* I am for now not assuming anything about unvetted dependencies but am just providing a helper function to find them
I think this is better to flag authorized imports and their respective calls. That being said, |
I think it's even worse than that: import numpy as np
np.sys.modules["os"].system("echo hello")
getattr(locals()["np"], "sys").modules["os"].system("echo hello") # In case you imagined that you could saveguard in anyform this. Any module that ever import |
I think I may have misunderstood this PR the first time I considered it. I'm willing to reconsider adding an |
Nice ! Feel free to refactor everything if it doesn't behave in a way that you find appropriate. This was mostly an example of what we could do ! |
As mentioned in #14, here is a first attempt at implementing an option for us to allow some library calls.
For the moment, this is a very manual process in the sense that we have to create a list of modules and the function calls that are ok by hand.
If you have any idea on improving this, your contribution is more than welcome !