Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft of perennial manylinux PEP #304

Closed
wants to merge 8 commits into from
Closed

Conversation

takluyver
Copy link
Member

Following discussion at: https://discuss.python.org/t/the-next-manylinux-specification/1043

I haven't yet written anything about allowed architectures. Both manylinux1 and manylinux2010 explicitly allow only x86_64 or i686, because the versions of CentOS they are based on only support those architectures. Implicitly, not saying anything means it's up to whatever profiles auditwheel defines. But we probably want to mention this, even if only to confirm that it's up to auditwheel.

We might also want to say something about ARM tags, if only to avoid giving people the impression that an ARM tag is just around the corner. See #84 for that topic.

@xhochy
Copy link

xhochy commented May 3, 2019

This only covers the glibc aspect of manylinux. How do we want to incoportate e.g. the versioned libstdc++ symbols?

@takluyver
Copy link
Member Author

That's what this bit is meant to cover:

As with the previous manylinux tags, wheels will be allowed to link against a limited set of external libraries and symbols. These will be defined by profiles in auditwheel.

This is the result of the discussion on the Discourse thread linked above.

@jjhelmus
Copy link

jjhelmus commented May 6, 2019

We might also want to say something about ARM tags, if only to avoid giving people the impression that an ARM tag is just around the corner.

The armhf/armhfp and aarch64/arm64 ARM ports are well defined and multiple linux distribution support these including CentOS 7. The Debian wiki page on ARM Ports is a good jumping off point to learn about these ports. I think these ports should be supported in this PEP. The sticky part is support for Raspberry Pi. I think armhf wheel may work on the Raspberry Pi 2 and 3 but not earlier models.

In addition, I believe there would be interesting in supporting the POWER8 or other POWER architectures in the PEP.

@njsmith
Copy link
Member

njsmith commented May 6, 2019

I think the PEP can support arbitrary architectures, and let auditwheel worry about the ARM ABI details.

@jjhelmus
Copy link

jjhelmus commented May 6, 2019

This might be premature discussion or the incorrect venue to bring up this issue. I'm happy to move this discussion to a more appropriate location.

As I mentioned in the python.org discussion, glibc is not always sufficient to determine binary compatibility between Linux distributions, the GCC runtime library and libstdc++ versions are also important. @xhochy's comment above mentions this as well.

If CentOS 7 is used as a base for compatibility the binary compatibility would be based on glibc 2.17 and GCC 4.8.5. The GCC version could create compatibility issues with the following Ubuntu and Fedora releases which ship with older GCC libraries:

  • Ubuntu 13.04, glibc 2.17, gcc 4.7.3
  • Ubuntu 13.10, glibc 2.17, gcc 4.8.1
  • Ubuntu 14.04, glibc 2.19, gcc 4.8.2
  • Fedora 19, glibc 2.17, gcc 4.8.1
  • Fedora 20, glibc 4.8.2, gcc 4.8.2

New symbols were added to libstdc++ in GCC 4.8.3 which may be included in extensions built by the CentOS 7 toolchain but would not be present in the libstdc++ libraries included in these Ubuntu and Fedora release. All of these releases are past their LTS support date (Ubuntu 14.04 was the most recent which was supported until 2019-04) so this possible incompatibility may not be of large concern.

@takluyver
Copy link
Member Author

Would one way to handle that be to survey the libraries in distributions released with glibc 2.17 or later, and pick the minimum version (in a distro that's worth supporting) to specify in the build profile? But presumably this would make it harder to create compliant wheels in a docker image based on CentOS...

Is there a good way to see what different distros had, including ones that are no longer formally supported? If not, maybe that's something someone could sprint on - collecting that information and displaying it in a convenient form for answering these kinds of questions.

@jjhelmus
Copy link

jjhelmus commented May 6, 2019

distrowatch.com has tables of the major libraries included in many distribution that could be used.

of pip, so distributors are concerned about moving to a new tag too quickly.

This PEP aims to remove the need for steps 1 and 5 above, so new manylinux tags
can be adopted more easily.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it removes step 4 as well? (We'd change warehouse once to allow all the tags, just like how it allows all versions of macos wheels.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption is that Warehouse would need to bump the version of auditwheel it's using to pick up the new profile, so there's still a small change to make.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, that's kind of orthogonal to this discussion (and anyway they have a bot that handles version bumping), so I'd probably count it? It would remove the need to explicitly write code and write a test. But it's not a big deal either way.

pep-perennial.rst Outdated Show resolved Hide resolved
pep-perennial.rst Outdated Show resolved Hide resolved
Where ``2_17`` is the major and minor version of glibc. I.e. for this example,
the platform must have glibc 2.17 or newer. Installer tools should be prepared
to handle any numeric values here, but building and publishing wheels to PyPI
will probably be constrained to specific profiles defined by auditwheel.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the bikeshedding part... instead of manylinux_glibc_2_17_x86_64, I would prefer manylinux_17_x86_64. Rationale:

  • Shorter, more to the point – readability counts. All else being equal, super-long filenames are annoying to work with.
  • Makes explicit that we aren't trying to support any hypothetical "glibc 3" here. (I doubt this will ever exist, and there's no way to prepare in advance for it. We probably shouldn't assume a 2.x wheel will work on a 3.y system, even though 3 > 2. The existing manylinux profiles explicitly only support 2.x for x greater than some threshold.)
  • manylinux_glibc is redundant today, and there's no reason for it not to stay redundant in the future. If we add a profile for alpine, we can just call it alpine_${version}_x86_64 or whatever; making it manylinux_alpine_${version}_x86_64 just makes things longer and buries the crucial information about which systems this wheel will work on.
    • And to the extent it matters, speaking as the person who invented the "manylinux" name... I've always thought of the "many" as being more-or-less a synonym for "glibc"
  • As noted in the discussion thread on Discourse, having glibc_X_Y suggests to some users that this wheel requires exactly glibc vX.Y, which is misleading. manylinux_Y avoids this.

Since this is bikeshedding I'm not going to get into a battle-of-wills over it, but I wanted to lay out the arguments at least once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the counterargument is that manylinux_17 doesn't have much clear meaning, especially in the context of having already had manylinux1 and manylinux2010.

Leaving out the 2_ also doesn't give us an obvious future numbering scheme if glibc 3.x does turn out one day, whether that's a compatibility-breaking release like Python 3 or a continuation more like Linux 3.

I don't have strong feelings over this, and I'm assuming you've thought more about this whole space than I have. But maybe if it needs further discussion we should take that somewhere where more people will participate. (There's nothing like more people for a naming discussion!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's awkward right now, but in a few years manylinux1 and manylinux2010 will be gone and forgotten. And the problem with mentioning glibc explicitly is that several people thought it had a clear meaning, but were wrong about what that meaning was :-).

manylinux_2_17_x86_64 isn't terrible either. But to explain why I'm not worried about glibc 3: Well first, glibc will definitely never pull a Linux 3, because the version corresponds to the soname. A glibc 3 would effectively be a totally new project with absolutely no binary compatibility; it would have no more connection to glibc 2 than it would to musl or dietlibc.

In theory it's possible that the glibc devs would decide to fork off a new project, call it glibc 3, and that all the distros would decide to switch to it. But they've been maintaining ABI compatibility for 22 years now; presumably if the ABI had a fundamental unfixable problem they'd have noticed by now. So it's kind of like preparing for a meteorite strike: sure, it could happen, but... something else will get you first :-). I think it's at least as plausible that all the distros will switch to musl, or that Linux will get replaced by Fuchsia, or x86-64 will get replaced by RISC-V, than that glibc 3 will appear and take over the world. And if it did it would still take like a decade to transition, so we'd have plenty of time to figure out what to do.

If we want to be really thorough with our due-diligence we could contact the glibc devs and ask them to explicitly confirm that glibc 3 is never going to happen.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a part-time glibc developer, I can tell you that libc.so.7 will almost surely never happen; however, there has been some discussion of a version 3.0 of the glibc source distribution. It would still build libc.so.6 but with some of the very oldest compatibility symbols dropped (relating to things like the attempt to make stdio FILE be the same thing as libstdc++ filebuf, which was abandoned as unworkable more than 20 years ago, but still complicates the stdio implementation). This is not likely to happen anytime soon.

Copy link
Member

@njsmith njsmith Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zackw Interesting.

The challenge with a 3.0 source version is that we basically assume that can check the "ABI compatibility level" of glibc by calling gnu_get_libc_version, and that this will return a string like "2.${ABI version}${arbitrary vendor-specific junk}", where ${ABI version} is an integer. Then we do automated things with ${ABI version}.

(And note that this is orthogonal to the "perennial" part of this proposal – a hypothetical glibc 3.0 release would also break manylinux1, manylinux2010, manylinux2014, etc.)

Anyway, in addition to breaking on a hypothetical future glibc 3.0, the current string-parsing hack is kinda gross, and means the glibc devs have API surface area that they don't know about. Maybe this means that they can never use 3.0 as a release name, because it would break all existing manylinux wheels. It's awkward all around.

I suspect the right solution is for glibc to add a new API, like gnu_get_libc_abi_level, which returns an integer. This integer has the following properties:

  • for old glibc 2.X versions, the ABI level is retroactively defined to equal the minor version number, so we can fall back on gnu_get_libc_version + string parsing
  • code built against a glibc with ABI level X will also work when run against a glibc with ABI level Y, so long as X <= Y

That lets us (eventually) drop our gross version parsing regex, makes the actual guarantees more visible to the glibc devs, and potentially provides a path to releasing 3.0 without breaking everyone (gnu_get_libc_version can return "3.0" but gnu_get_libc_abi_level would keep incrementing monotonically). And then the perennial manylinux tag name would be defined as manylinux_${glibc ABI level}_${platform}. (Which is conceptually different from the manylinux_${glibc minor version number}_${platform} I suggested above, but identical in practice.)

Does that sound sensible to you?

If we wanted to bring this to the glibc devs, how would we do that? (And is there any way I can do it without signing up for the glibc dev firehose?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bringing over my preference from the Discourse thread: my view is the opposite of Nathaniel's, and would favour the slightly longer manylinux_bp_glibc_2_17_x86_64. The intent of the _bp addition is that it makes the first part stand for "manylinux build profile", and then then rest of the string names the exact build profile. If we're worried about filename length, then we could abbreviate it further to mlbp_glibc_2_17_x86_64, but that has the downside of losing the useful property of having "linux" as a visible part of the filename.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion in that glibc issue, it sounds like the recommendation is to assume that glibc 3.x might happen, and to assume that the major version bump won't indicate a compatibility break. So I'm changing my recommendation here to manylinux_2_17_x86_64.

Also, apparently, we should be going back and revising the previous manylinux PEPs and pip code to assume that glibc 3 is compatible with glibc 2. I guess the simplest way to handle that would be to make this PEP say the right thing, and then say manylinux1 and manylinux2010 are redefined as aliases for manylinux_2_5 and manylinux_2_12, with the new very-slightly-different semantics.

I don't see any value in sticking _bp_glibc_ in there. manylinux and manylinux_bp_glibc mean exactly the same things, except one has extra clutter. What could any manylinux tag even be besides a "build profile"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With that, I'd expect to be able to go look up a spec called "manylinux 2.17". There's no such spec - there's a manylinux 2014 spec that calls out a glibc 2.17 build profile.

So if we're going to switch from naming the spec version in the filename to instead naming the build profile, the filename should reflect that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a manylinux 2014 spec

Sure, that's one way we could do things, but... that's not what this PEP is proposing.

I'd expect to be able to go look up a spec called "manylinux 2.17"

The spec you would want is the "manylinux" spec, which seems reasonable. It's exactly the same as how when you see a wheel tagged macos_10_10, you don't look for the "macos 10.10" spec, you look for the "macos" spec. (Well, actually it's much harder to figure out the relevant spec than that – you have to figure out that it's from PEP 425, which incorporates the behavior of distutils.util.get_platform() by reference, and then figure out in what circumstances distutils returns strings that start with macos.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For disambiguation, how about dropping "manylinux", and using linux_2_17_x86_64 ?

pep-perennial.rst Show resolved Hide resolved
pep-perennial.rst Outdated Show resolved Hide resolved
pep-perennial.rst Outdated Show resolved Hide resolved
pep-perennial.rst Outdated Show resolved Hide resolved
Author: Nathaniel J. Smith <[email protected]>
Thomas Kluyver <[email protected]>
BDFL-Delegate: Nick Coghlan <[email protected]>
Discussions-To: Distutils SIG <[email protected]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to use distutils-sig, or switch to discourse?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only subscribed to specific threads on Discourse, so if someone starts a new one, I won't necessarily see it. So maybe +0.1 for distutils-sig. But I'm happy to change this if the consensus is that we're doing discussions on Discourse now.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for this particular discussion, right now, it really does seem like we are doing the discussion on the Discourse thread. So I suggest that we change it to reflect the current reality.

@mayeut
Copy link
Member

mayeut commented Jun 2, 2019

@jjhelmus, what new symbols have been added in gcc 4.8.3 libstdc++ ?
According to the script available in auditwheel, centos 7 is the minimum common denominator for many distributions:

centos7      {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
amazonlinux2 {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"GLIBC": "2.18", "2.22", "2.23", "2.24", "2.25", "2.26", "CXXABI": "1.3.8", "1.3.9", "1.3.10", "1.3.11", "FLOAT128", "GLIBCXX": "3.4.20", "3.4.21", "3.4.22", "3.4.23", "3.4.24", "GCC": "7.0.0"
amazonlinux1 {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"CXXABI": "1.3.8", "1.3.9", "1.3.10", "1.3.11", "FLOAT128", "GLIBCXX": "3.4.20", "3.4.21", "3.4.22", "3.4.23", "3.4.24", "GCC": "7.0.0"
fedora20     {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"GLIBC": "2.18", 
fedora21     {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"GLIBC": "2.18", "CXXABI": "1.3.8", "GLIBCXX": "3.4.20",
debian8      {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"GLIBC": "2.18", "CXXABI": "1.3.8", "GLIBCXX": "3.4.20",
ubuntu14.04  {"GLIBC": ["2.10", "2.11", "2.12", "2.13", "2.14", "2.15", "2.16", "2.17", "2.2.5", "2.2.6", "2.3", "2.3.2", "2.3.3", "2.3.4", "2.4", "2.5", "2.6", "2.7", "2.8", "2.9"], "CXXABI": ["1.3", "1.3.1", "1.3.2", "1.3.3", "1.3.4", "1.3.5", "1.3.6", "1.3.7", "TM_1"], "GLIBCXX": ["3.4", "3.4.1", "3.4.10", "3.4.11", "3.4.12", "3.4.13", "3.4.14", "3.4.15", "3.4.16", "3.4.17", "3.4.18", "3.4.19", "3.4.2", "3.4.3", "3.4.4", "3.4.5", "3.4.6", "3.4.7", "3.4.8", "3.4.9"], "GCC": ["3.0", "3.3", "3.3.1", "3.4", "3.4.2", "3.4.4", "4.0.0", "4.2.0", "4.3.0", "4.7.0", "4.8.0"]}
             +"GLIBC": "2.18",

@mayeut
Copy link
Member

mayeut commented Jun 2, 2019

Ubuntu 14 was updated at some point in its lifetime to switch to gcc 4.8.4 which provides the symbols introduced in GCC 4.8.3 (GLIBCXX_3.4.19)

But the exact details of auditwheel's checks will evolve over time as the Linux
distribution landscape changes and as we learn more about real-world
compatibility pitfalls.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't done an exhaustive investigation here, but I want to raise the issue of source and binary compatibility for C++ code right now: this should be explicitly addressed in the PEP. There is a catch-22 here:

  • New features are constantly being added to C++ the language; new code has a good chance of failing to compile with an old compiler, and contrariwise old code has a good chance of failing to compile with a new compiler. Each new version of GCC adds many new symbols to libstdc++.so.6.
  • However, bundling libstdc++ in wheels is not safe, because it has internal global state. If there is more than one copy of libstdc++ loaded into a process, the behavior of the entire program becomes undefined, in the sense in which the C and C++ standards use that term.

This may mean that it is necessary to write the version of the C++ compiler into the name of the wheel, and to say that manylinux_glibc_X_g++_Y is not coinstallable with manylinux_glibc_X_g++_Z where Y ≠ Z. It's possible that there's a cleverer option, but I am not sure what it would be.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, is that true even if I have two libraries that statically link their own copies of libstdc++ (or that dynamically load libstdc++ into disjoint ELF namespaces), and whose public APIs are entirely C based?

Anyway, the way we handle this right now is basically that if you want to make a wheel that targets, say, 2010-era Linux, then you dynamically link to the system libstdc++ and you assume it only contains 2010-era C++ symbols.

This is somewhat restrictive, but less restrictive than you would think. Redhat distributes newer compilers that contain clever linker scripts, that use the system symbols whenever they're available, and for any unavailable symbols it statically links them into the binary. These are the compilers we provide in the manylinux images.

And anyway, I don't think we can hope to do any better. For manylinux wheels, we generally assume that they might have to coexist with arbitrary other extensions, built in arbitrary ways. For example, there might be packages that the user compiled themselves using their regular distro compiler, and that link to the system libstdc++. According to your rule, this means that we can never vendor libstdc++ at all; we have to use the system libstdc++. And if we're stuck using the system libstdc++, then the glibc version pretty much tells us what that is, so there's not much point in putting it in the wheel filename.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is more than one copy of libstdc++ loaded into a process, the behavior of the entire program becomes undefined

is that true even if I have two libraries that statically link their own copies of libstdc++ (or that dynamically load libstdc++ into disjoint ELF namespaces), and whose public APIs are entirely C based?

Yes. The root issue is that each copy of the library assumes it has the only copy of a global data structure (such as: tables of exception handling information, tables of RTTI information, the cin object, the canonical empty std::string, etc). Malfunctions I've personally witnessed include std::terminate and/or std::unexpected getting called in a program that should never have called them, typeid invariants being violated, and segmentation faults upon attempting to write to a std::string object that should have been mutable. I wouldn't be surprised if the problem with call_once that came up in one of the precursor discussions to this PEP could be traced to the same issue.

Some of the problems could be mitigated by ensuring that the Python core interpreter executable and/or libpython.so were linked against libgcc_s.so.1; to make this fully reliable, interpreter initialization would need to call a function that is only defined in that library, ideally before any threads have been created. (This would also fix some of the problems currently being muddled together in https://bugs.python.org/issue18748 .) But it wouldn't be a complete solution and I'm not sure what a good function to call would be.

if we're stuck using the system libstdc++, then the glibc version pretty much tells us what that is

This may be true for CentOS but I know some other distributions (e.g. Debian, Arch) do not update GCC on the same schedule as glibc, so it's not perfect.

Documenting a specific gcc and g++ version to be used to build wheels conforming to each rev of manylinux would probably be enough to address the problem, as long as CPython itself continues not to contain any C++ code. However, it was my impression that the idea of perennial manylinux was to avoid having to research and document such things?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The root issue is that each copy of the library assumes it has the only copy of a global data structure (such as: tables of exception handling information, tables of RTTI information, the cin object, the canonical empty std::string, etc).

Sure, I get that part. What I'm missing is what libstdc++ does to thwart all the stuff that linkers do to try to make this situation work anyway. For example if I make my own nonstd::EMPTY_STRING and it gets loaded twice into different ELF namespaces that only communicate via C, it's generally fine – each namespace consistently uses its own copy of the singleton.

Anyway, we don't recommend vendoring libstdc++ so this is mostly academic curiosity, and also in case it gives some hint to the mysterious call_once issue.

This may be true for CentOS but I know some other distributions (e.g. Debian, Arch) do not update GCC on the same schedule as glibc, so it's not perfect.

That's fine – we don't care about the leading edge, only the trailing edge, and the trailing edge is much more stable and boring. So e.g. consider a manylinux_23, targeting systems with glibc 2.23 and newer. glibc 2.23 was released in 2016. If we look around at all the distros currently shipping with glibc 2.23 or later, and compute the minimum libstdc++ version, it's probably going to be something from 2016-ish. The situation where we'd get in trouble is if suddenly a new distro appears that's shipping an older version of libstdc++ than any of the distros we already looked at – so you'd need a new 2019 distro that just started shipping a libstdc++ from 2015-ish. That never happens in practice.

Sure, this means that it's actually impossible to know what the definition of manylinux_X is until some time after glibc 2.X has been out in the wild – the standard says that manylinux_X wheels have to be compatible with all distros shipping glibc 2.X, and that's not determined until all those distros have shipped. But that's unavoidable, and not really a problem since our goal is to achieve broad real-world compatibility, not generate wheels that only work on Arch.

However, it was my impression that the idea of perennial manylinux was to avoid having to research and document such things?

No, the goal is to do the same research and documentation, but decouple it from the PEP review cycle and pip release cycle.

Copy link

@zackw zackw Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each copy of the library assumes it has the only copy of a global data structure

What I'm missing is what libstdc++ does to thwart all the stuff that linkers do to try to make this situation work anyway

The uniqueness requirement applies across the entire address space. ELF namespaces don't do a thing to help; in fact they may make the situation worse, by preventing symbol deduplication from happening.

With typeid and exceptions (which rely on typeid comparison) the problem is that typeid objects are compared by address. This means that if you have a std::runtime_error thrown out of code using libstdc++ copy 1, code using libstdc++ copy 2 will be unable to catch it as a std::runtime_error. In fact, I think it can only catch it using catch (...).

I don't know specifically what the problem is with call_once or the empty std::string but I believe it is something along the same lines.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unrelated, but what do the current PEP's say about manylinux1 and manylinux2010 libraries being loaded at the same time? I am guessing if they both link libstdc++, they may present a similar issue.
Was this issue not considered before?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gunan I'm not sure how I managed to miss your comment until now. You're right, this is a problem right now because, according to current policy, libstdc++ is to be bundled in wheels, so a manylinux1 and a manylinux2010 wheel that both use C++ would each include a different version of the library. But if libstdc++ is added to the don't-bundle list, which I think is a necessary part of the fix anyway, then both wheels will pick it up from the system, and it should be OK as long as it's new enough. (Requiring a specific version of G++ to be used is just to control what "new enough" means. I think.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libstdc++ has always been on the "don't bundle" list. However, we have recommended the use of the RH-maintained "devtoolset" compilers, which allow the use of new C++ features on old systems by selectively statically linking pieces of newer libstdc++ into the generated binaries, via some linker script magic.

@zackw
Copy link

zackw commented Jun 3, 2019

Having jumped into the discussion here, it occurs to me I should probably introduce myself. I mostly do Internet security research nowadays, but about fifteen years ago I was one of the GCC core developers and I still work on GNU libc in my copious free time, so I have a bunch of experience with low-level Linux-related issues. @brainwane brought this to my attention as something I might be able to offer advice with. (I'm also an old friend of Nathaniel's but that's neither here nor there.)

@ncoghlan
Copy link
Member

@zackw Thanks for chiming in! Very useful info, and challenging of some questionable assumptions on our part :)

@ncoghlan
Copy link
Member

Putting aside the naming scheme bikeshedding question, and the known issues with defining sensible broadly compatible build profiles (which Nathaniel and Zack have well covered), I like this write-up :)

@zackw
Copy link

zackw commented Jun 19, 2019

Having caught up on the discussion over on discuss.python.org, it seems to me that the remaining gap between proponents of this spec and proponents of manylinux2014 is mostly about documentation.

When people who are not comfortable with perennial say things like "…you can never definitively know that a wheel is manylinux X compatible, only that it satisfies the current recommendations. I’m not entirely comfortable with this"1 or "there’s no document I can refer to which lets me check if my system is manylinux X compliant, and no set of checks I can run to do so"2, these appear to me to be requests for documentation.

Most of the pushback from the perennial side seems to be on the grounds that the current process for documenting what "manylinux X" means is burdensome (both because the PEP process is too heavyweight, and because it involves a bunch of work repeated in three places—the PEP, the auditwheel profile, and the blessed build environment). Their proposed alternative seems to be to abandon the idea altogether of writing a human-readable spec for each manylinux rev. And I can see where they're coming from, but I can also see how this makes people uncomfortable.

I would like to suggest two things to help bridge the gap. First, I'd like to endorse the utility of what @njsmith said here:

a tool to dump auditwheel profiles into readable text wouldn’t be 100% accurate, but it could be useful. It seems like the simplest thing to do would be to have this tool dump text onto some pages on auditwheel.readthedocs.io, updated with every commit?

The existence of these generated docs would reassure people that they don't have to dig through the entire auditwheel codebase just to find the profile. It would also be a visible place for the rationale for each aspect of the profile to be documented, and thus to encourage auditwheel devs to write down those rationales. The "10% quirky stuff that involves code changes" (from the same post) could be dealt with by presenting the actual code as part of the generated docs: "A manylinux2020 system must have properties A, B, and C. To determine whether it has these properties, call these Python functions: [source code]".

Second, I think it would be valuable if this PEP went into some detail about the process of developing a new manylinux rev. It doesn't have to be perfect, but it should be comprehensive enough that someone who hasn't done it before could imagine themselves going through the process, guided by the PEP + existing auditwheel documentation (which the PEP would reference). For instance, there should be a list of decisions that need to be made and criteria for each, guidelines for deciding which Linux distributions should be examined and from how long ago, guidelines for deciding what C and C++ compilers to put into the blessed build image, that sort of thing.

@takluyver
Copy link
Member Author

If I've understood @njsmith's position correctly, it's not that writing a human-readable document is burdensome. I believe he's saying that the specification for a manylinux X wheel should be "one that works on all mainstream distributions with glibc >= X", and that the details of what external libraries auditwheel lets it link against (besides glibc) are an implementation detail which should be free to evolve over time.

I think I understand some of the reasons behind this, and I have a lot of respect for Nathaniel's judgment. But I'm not so far entirely sold on the notion of making the spec that vague, and

(Also, just to remind everyone: I deliberately put this branch in the pypa repo rather than my fork, so anyone with access to this repo can modify it. If you think changes are agreed on, don't wait for me. I'll be fairly busy this week and next).

Where ``2_17`` is the major and minor version of glibc. I.e. for this example,
the platform must have glibc 2.17 or newer. Installer tools should be prepared
to handle any numeric values here, but building and publishing wheels to PyPI
will probably be constrained to specific profiles defined by auditwheel.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For disambiguation, how about dropping "manylinux", and using linux_2_17_x86_64 ?

- ``manylinux1_x86_64`` becomes ``manylinux_glibc_2_5_x86_64``
- ``manylinux2010_x86_64`` becomes ``manylinux_glibc_2_12_x86_64``

``x86_64`` refers to the CPU architecture, as in previous tags.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the "features" we like in the manylinux 2014 PEP is the expanded platform support. Is that possible to include that here, or should that be a separate proposal?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, looks like multiple architectures are implied below. It may be useful to add the explicit list here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These were only meant to be examples, but I agree that could be clearer.


A wheel may never use symbols from a newer version of glibc than that indicated
by its tag. Likewise, a wheel with a glibc tag under this scheme may not be
linked against another libc implementation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that this is good or bad, just checking.
Does this forbid users from using any alternative libc implementations:
http://www.etalabs.net/compare_libcs.html

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being, we're only trying to specify binary compatibility for platforms based on glibc, because that's what people have done the work to figure out how to make things work.

This doesn't forbid users from using another libc implementation, but this spec is not meant to cover distributing pre-built binaries to those users. They would have to build extension modules from source until someone has figured out another spec.

The main use case I've heard about for an alternative libc is docker containers based on Alpine Linux, which uses musl. One approach that has been suggested is to standardise alpine-specific tags for that use case.

pep-perennial.rst Show resolved Hide resolved
@takluyver
Copy link
Member Author

This is now in the main PEPs repo and published (though not yet accepted) as PEP 600, so I'll close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants