Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of the Description metadata field #867

Open
dnicolodi opened this issue Jan 23, 2025 · 3 comments
Open

Handling of the Description metadata field #867

dnicolodi opened this issue Jan 23, 2025 · 3 comments

Comments

@dnicolodi
Copy link

Since metadata version 2.1, the package description can go into the body of the RFC822 metadata serialization format. Before then, it was encoded in the Description header with custom multi-line serialization involving continuation lines to be prefixed with 7 spaces and a |, see https://packaging.python.org/en/latest/specifications/core-metadata/#description

packaging.metadata.RawMetadata and packaging.metadara.Metadata do not interpret the continuation lines prefix, thus consumers of the description fields of these objects need to implement it themselves. This is a bit inconvenient and it may result in different tools interpreting the metadata differently. Would it be desirable to add continuation lines interpretation to packaging.metadata?

One twist on this is that setuptools (at least in version 45.2.0, which is still used in the wild, see pypa/twine#1218) generates continuation lines with the wrong prefix: the | is missing. How would this need to be treated?

The same applies also to the License field which in some projects is used to store the whole text of the license as a multi-line field, although with a different format which does not use the | character. Fixing the leading space there would also be nice. The License field will be slowly be replaced by the License-Expression and License-File field, but parsing metadata from existing distributions still applied, and the tail of packages building with outdated build backends is very long.

@brettcannon
Copy link
Member

Would it be desirable to add continuation lines interpretation to packaging.metadata?

I think it depends on how much it complicates the code. If I remember correctly, the desrcription and license fields were so messy and inconsistent to parse we just didn't want to try and make it work for all the possible formats.

One twist on this is that setuptools (at least in version 45.2.0, which is still used in the wild, see pypa/twine#1218) generates continuation lines with the wrong prefix: the | is missing. How would this need to be treated?

As a bug and best left to 3rd-party code to handle.

The same applies also to the License field which in some projects is used to store the whole text of the license as a multi-line field, although with a different format which does not use the | character.

It's actually worse than that as https://packaging.python.org/en/latest/specifications/core-metadata/#license doesn't outline a format, it just gives an example.

@dnicolodi
Copy link
Author

I think it depends on how much it complicates the code. If I remember correctly, the desrcription and license fields were so messy and inconsistent to parse we just didn't want to try and make it work for all the possible formats.

The code that handles the format described in the metadata standard and the format used by setuptools is very easy:

def _dedent(string: str) -> str:
    lines = string.splitlines()
    if (all(line.startswith("       |") for line in lines[1:]) or
        all(line.startswith("        ") for line in lines[1:])
    ):
        for i in range(1, len(lines)):
            lines[i] = lines[i][8:]
        return "\n".join(lines)
    return string

However, it makes sense to have this in packaging if there is interest in supporting at least the standard format and the one used by setuptools which is widely deployed. Otherwise, users that want to deal with packages in the wild will still need to carry 90% of this code, which is not much of an improvement, IMO.

I understand the reason behind supporting only the standard, but it seems that no package build backend has ever implemented it: twine till the penultimate release used a metadata parser that implemented only the setuptools format and no one ever complained. I think that all other packages implementing metadata writing went straight to put the package description into the RFC822 message body.

It's actually worse than that

Yeah, I know. It is too bad that the standard is so vague there. The sensible thing to do there is to remove a common white space prefix, if any.

@brettcannon
Copy link
Member

However, it makes sense to have this in packaging if there is interest in supporting at least the standard format and the one used by setuptools which is widely deployed.

I get the pragmatism of supporting what setuptools did for a long time, but as the person who has to support it in this package, I would rather stick with the spec and let people build on top of 'packaging' as they deem necessary with whatever extra features they want.

I understand the reason behind supporting only the standard, but it seems that no package build backend has ever implemented it

That would explain why no one has brought it up until now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants