Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: a way to opt out of "hermetic python" #22216

Open
2 tasks done
SomeoneSerge opened this issue Jul 1, 2024 · 2 comments
Open
2 tasks done

Feature Request: a way to opt out of "hermetic python" #22216

SomeoneSerge opened this issue Jul 1, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@SomeoneSerge
Copy link

Hi! We're encountering issues concerning #20469 with the jax{,lib} packages in Nixpkgs. It is crucial to our project to bootstrap the python interpreter from scratch and outside bazel, rather download a prebuilt one from the internet. We've been relying on bazel selecting the system python (setting PYTHON_BIN_PATH) for this. Right now we're looking into reverting the respective diff ad hoc, but we'd prefer a tighter integration

Thanks

Please:

  • Check for duplicate requests.
  • Describe your goal, and if possible provide a code snippet with a motivating example.
@vam-google
Copy link
Contributor

vam-google commented Jul 1, 2024

Hello @SomeoneSerge,

For the projects which want to provide their own Python interpreter (a Linux distribution would be a good example of that) there is a way to doe exactly that. For the other concerns, please check my response in the corresponding nixpkgs thread.

Specifying your own Python interpreter

For a basic example of specifying your own interpreter please check Building with pre-release Python version.

Note, you do not have to follow those instructions directly if you don't want. The only thing that you actually need is to be able to specify a path to your desired (hopefully working) python interpreter in the end. The instructions just serve the purpose of obtaining such working custom interpreter, if you already have one, you can skip directly to the last step:

To use newly built Python interpreter add the following code snippet RIGHT AFTER python_init_toolchains() in your WORKSPACE file.

load("@rules_python//python:repositories.bzl", "python_register_toolchains")
python_register_toolchains(
    name = "python",
    # By default assume the interpreter is on the local file system, replace
    # with proper URL if it is not the case.
    base_url = "file://",
    ignore_root_user_error = True,
    python_version = "3.13.0a6",
    tool_versions = {
        "3.13.0a6": {
            # Path to .tar.gz with Python binary.
            "url": "/full/path/to/your/python_dev-3.13.0a6.tgz",
            "sha256": {
                # By default we assume Linux x86_64 architecture, eplace with
                # proper architecture if you were building on a different platform.
                "x86_64-unknown-linux-gnu": "cd99233ccd2df426208be3d394e1b89bbb2511caf389cfa9df7bab624a6cdc24",
            },
            "strip_prefix": "python_dev-3.13.0a6",
        },
    },
)

Mimic old non-hermetic behavior (very very not recommended)

We do not support and do not recommend this use case, but it is still possible with a little bit of work on your side.

The instructions above assume you have python packaged as a standalone .tgz archive. If you still want to just depend on whatever is installed locally on your system, you can go further but there is an important thing to know before doing so (which may affect your decision):

Even in previous non-hermetic python setup, it was wrapping system python inside bazel rules and copying parts of your system python package inside bazel's cache to be abele to use it for the rest of the build. I.e. non-hermetic python acted almost as it was still downloading a python from somewhere, it was just that "somewhere" happened to be your local environment.

With that being said, you can mimic old non-hermetic python setup by having a custom repository rule which would search your local system, package it in a form of a standalone archive to match structure of the ones we currently depend on (the structure there matches default layout you would get when build vanill Python from official sources) and then provides the packaged archive to the value for url field in the code snipped above.

Note, we do not provide such custom local-file-system-search rule ourseves and do not plan to, as it basically would re-introduce the non-hermetic python with all its issues, such as being very fragile and non-reproducible setup, but it is not very difficult to implement such on your side, especially if you do not need to make it generic (if it matches only NixOS structure, than it would be much easier to implement and maintain than something which should work on any linux system).

@SomeoneSerge
Copy link
Author

Hi! Thank you @vam-google, for a prompt and comprehensive reply. I hope my comment about "random executables" in the other issue wasn't too rude, otherwise I'm prepared to apologize:)

We'll look into implementing a solution along the lines of the snippet you provided. I expect it won't be a "smooth ride".

I think it's best I avoid "explaining Nix" in this thread (maybe we reserve the Nixpkgs issue for that) and focus on our integration with Bazel. Suffice it to say that Nix and Bazel overlap in scope, which is why (1) we're not concerned about reproducibility and correctness of "system packages", and (2) why having Bazel set up its own sandbox and copy these packages into new locations has been problematic.

For now, there are two implementation details about the python we build I'm worried about:

  • Currently, we only consider our outputs "correct" if they're deployed at certain pre-defined locations: e.g. /nix/store/<hash>-python-<version>/bin/python will have a PT_INTERPTER pointing at a specific /nix/store/<hash>-glibc-<version>/lib/ld-linux<...>.so, etc. Things would likely still work: since Bazel implements hermetic builds, I presume it would patchelf the executables it "downloads"?
  • We only consider our outputs "correct" when referential integrity is maintained: if <...>/bin/python references <...>/libz.so, this python can only be deployed together with it. We have ways to inspect and export the full "closure" of such a package, but we've never explicitly provided Bazel's "old non-hermetic python" with this information. Does this mean that Bazel has been copying "our python" and relinking it to different versions of dependencies for the duration of the build?

Perhaps these notions of "correctness" (=conditions under which we're ready to "guarantee" our software will work) also clarify why I believed it would be easiest for us if we could relax Bazel's sandbox and let it see the pre-deployed toolchain

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants