Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: add vetted properties #15

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

McPatate
Copy link

@McPatate McPatate commented Feb 18, 2022

As mentioned in #14, here is a first attempt at implementing an option for us to allow some library calls.

For the moment, this is a very manual process in the sense that we have to create a list of modules and the function calls that are ok by hand.

If you have any idea on improving this, your contribution is more than welcome !

@CLAassistant
Copy link

CLAassistant commented Feb 18, 2022

CLA assistant check
All committers have signed the CLA.

Copy link

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if feasibly we can vet functions/methods in the libraries. I think vetting the library would be enough? We could have unsafe parts of a library, but I'm not sure if we want to include all methods and classes from the libraries we trust here.

Also, I think another useful thing would be to be able to get a list of those libraries which are used/imported by the pickle file, for downstream analysis.

It would also be useful to add this to the command line arguments of the CLI tool.

fickling/pickle.py Outdated Show resolved Hide resolved
fickling/pickle.py Outdated Show resolved Hide resolved
Comment on lines 448 to 451
if self._vetted_dependencies is None or self._vetted_calls is None:
return not self.has_import and not self.has_non_setstate_call
else:
return not self.has_unvetted_import and not self.has_unvetted_calls

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't is_likely_safe be the and of all 4 conditions when applicable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.has_import and self.has_unvetted_import as well as self.has_non_setstate_call and self.has_unvetted_calls would clash, given that the latter is more permissive than the former.

@McPatate
Copy link
Author

I'm not sure if feasibly we can vet functions/methods in the libraries. I think vetting the library would be enough? We could have unsafe parts of a library, but I'm not sure if we want to include all methods and classes from the libraries we trust here.

I understand this is cumbersome. I'm not sure how to link a function call to its module. I'm currently looking in the ASTProperties object of Pickled which contains imports and calls as two separate lists.
It may be possible by parsing the AST, I have to dig deeper.

Also, I think another useful thing would be to be able to get a list of those libraries which are used/imported by the pickle file, for downstream analysis.

To achieve this :

from fickling.pickle import Pickled
import pickle

pkl = b'cbuiltins\neval\n(Vprint("hello")\ntR.'
p = Pickled.load(pkl)
imports = [imp.module for imp in p.properties.imports]
print(imports)

Do you think it's worth creating a function that returns imports ? Or just adding it to the cli suffices for now (imo should be the matter of another PR) ?

@adrinjalali
Copy link

Do you think it's worth creating a function that returns imports ? Or just adding it to the cli suffices for now (imo should be the matter of another PR) ?

Fair enough, no need to add it.

Copy link
Collaborator

@ESultanik ESultanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% convinced yet that any imports are safe. For example, numpy is not, in general. At a minimum, I think fickling should print out a warning if any vetted imports or calls are provided. Something like, "Warning: This pickle file imports external module [X] which was added to the vetted module list." We should probably also add a command-line option to specify them. I'm happy to help out with that.

fickling/pickle.py Outdated Show resolved Hide resolved
fickling/pickle.py Outdated Show resolved Hide resolved

arr = np.array([1,2,3])
fickled = Pickled.load(pickle.dumps(arr))
fickled.vetted_dependencies = ["numpy"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy itself is not safe. For example: https://numpy.org/doc/stable/reference/generated/numpy.distutils.exec_command.html

How are vetted_dependencies and vetted_calls intended to interact? Are ndarray and dtype the only calls within numpy that are allowed? This should be documented somewhere in the README. I'm happy to help with that.

@adrinjalali
Copy link

I'm not 100% convinced yet that any imports are safe. For example, numpy is not, in general. At a minimum, I think fickling should print out a warning if any vetted imports or calls are provided. Something like, "Warning: This pickle file imports external module [X] which was added to the vetted module list."

I'd be happy with that.

numpy itself is not safe. For example: https://numpy.org/doc/stable/reference/generated/numpy.distutils.exec_command.html

How are vetted_dependencies and vetted_calls intended to interact? Are ndarray and dtype the only calls within numpy that are allowed? This should be documented somewhere in the README. I'm happy to help with that.

Fair point. But I'm not sure if vetting is in the scope of fickling. I was thinking of fickling just to allow people to add allowed modules/methods. We could probably have a separate library to have suggestions for vetted sections of different libraries and hope people would contribute to them?

@McPatate
Copy link
Author

McPatate commented Mar 1, 2022

As @adrinjalali said, the idea is indeed to let the user add a list of libraries and function calls within it as "vetted".

I also do not think fickling should vet any particular library, this should be at the discretion of users (though you could have a "community vetted librairies" example or document, but that doesn't seem safe to me).

The only problem I see with the current implementation is that there is no link between function calls and the module they're from, e.g. dtype could come from another imported module and not be safe.

Are ndarray and dtype the only calls within numpy that are allowed?

That was my idea, yes. I think this way you avoid allowing numpy calls that execute arbitrary code as you showed.

McPatate added 3 commits March 7, 2022 15:17
* I am for now not assuming anything about unvetted dependencies
  but am just providing a helper function to find them
@McPatate
Copy link
Author

McPatate commented Mar 7, 2022

@ESultanik, @adrinjalali

I think this is better to flag authorized imports and their respective calls. That being said, is_likely_safe will still return False for the moment.
I need to figure out a way to make sure that when we have ast.Call objects, that the actual function call is for an imported function (that can be checked with has_unvetted_dependencies) and not some user defined function (which I think is unsafe ?).

@Narsil
Copy link

Narsil commented Mar 7, 2022

I'm not 100% convinced yet that any imports are safe. For example, numpy is not, in general.

I think it's even worse than that:

import numpy as np
np.sys.modules["os"].system("echo hello")
getattr(locals()["np"], "sys").modules["os"].system("echo hello") # In case you imagined that you could saveguard in anyform this.

Any module that ever import os or subprocess is not safe to import.

@ESultanik
Copy link
Collaborator

I think I may have misunderstood this PR the first time I considered it. I'm willing to reconsider adding an --allow-list of imports to ignore, as long as there are none enabled by default. I am going to extend this branch to enable their specification via the command line.

@McPatate
Copy link
Author

McPatate commented Sep 7, 2022

Nice ! Feel free to refactor everything if it doesn't behave in a way that you find appropriate.

This was mostly an example of what we could do !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants