Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perl:data-munge : v0.10 is sunk by error #697

Closed
jrmarino opened this issue Apr 24, 2023 · 29 comments
Closed

perl:data-munge : v0.10 is sunk by error #697

jrmarino opened this issue Apr 24, 2023 · 29 comments

Comments

@jrmarino
Copy link
Contributor

Projects affected

https://repology.org/project/perl:data-munge/versions

Observed behavior

The new version 0.10 is considered older or inferior to the 6-year version 0.097.
I believe 0.10 has been mistakenly sunk.

Expected behavior
verson 0.10 should be considered the latest.
It is numerically higher (0.10 > 0.097) and it's 6 years newer.

Proof

image

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 24, 2023

This is upstream problem. Versions are not numbers and 0.10 < 0.97 (which equals 0.097).

@AMDmi3 AMDmi3 closed this as not planned Won't fix, can't repro, duplicate, stale Apr 24, 2023
@jrmarino
Copy link
Contributor Author

can you please explain this clearly? It's not "0.97", it is "0.097". You are off by a facter of 10.

what do you mean "0.097" equals "0.97"? that makes zero sense.

What do you mean "versions are not numbers" ? Ever single character in the version is a digit or a period. It's a number

Can you please explain what the actual problem is?

@jrmarino
Copy link
Contributor Author

jrmarino commented Apr 24, 2023

multiply both versions by 1000.
0.10 x 1000 = 100
0.097 x 1000 = 97
100 > 97

It's basic math.

@jrmarino
Copy link
Contributor Author

I'm starting to figure out what you are saying.
you are saying you consider versions are text.
However, if your text comparison equates 0.097 as 0.97, I don't see how that's an upstream problem. That seems like a flaw on repology side to me.

@jrmarino
Copy link
Contributor Author

jrmarino commented Apr 24, 2023

I suggest repology software be improved to either

  1. support numbers as version (either indicated, or if there's only digits with at most one period)
  2. consider leading zeros as significant (when indicated)

data::munge has a perfectly valid version scheme
It would be a completely understandable response to say repology needs to fix their software rather than upstream having to assign a new version to make repology happy.

This shortcoming is clearly on repology's side.
Surely you can figure out a solution here.

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 24, 2023

you are saying you consider versions are text

No I am not. Versions are neither text nor numbers, versions are versions and are compared in a special way. For repology, dedicated library was written which can handle the most cases meaningfully, and most package managers implement their own version comparison code. For all of these 0.10 is less than 0.097, though. Just to give a few examples:

% pkg version -t 0.10 0.097
<
% version_compare 0.10 0.097
<
% dpkg --compare-versions 0.10 lt 0.097 && echo YES || echo NO
YES
% python3 -c 'from packaging import version; print(version.parse("0.10") < version.parse("0.097"))'
True

So this problem has nothing to do with Repology: data::munge versioning is not compatible with any mechanisms which use its version, so it is broken and should be fixed upstream. Repology does not fix stuff which needs to be fixed elsewhere.

@jrmarino
Copy link
Contributor Author

jrmarino commented Apr 24, 2023

That is a compelling answer. It's a shame it took about 6 attempts to get it.
It would be worth copying this to a FAQ somewhere.
I will attempt to point the author of data::munge to this page.

edit: I emailed Lukas Mai. Hopefully he agrees to update the version number.

@mauke
Copy link

mauke commented Apr 25, 2023

Repology does not fix stuff which needs to be fixed elsewhere.

Have fun telling every single Perl module author and the language itself that they need to change what they have been doing since about 1990. That is, old-style Perl version numbers are (fractional) numbers and need to be compared numerically.

See also:

  • Amazon::S3 (version 0.44 is older than 0.441 is older than 0.45)
  • Prima (version 1.67 is older than 1.67001 is older than 1.68)
  • Function::Parameters (version 2.001006 is older than 2.002 is older than 2.002001)
  • Task::MusicBundle (version 0.05 is older than 0.0501 is older than 0.06)

I don't think a change there is going to happen anytime soon. You'd have to patch every little deployed piece of code and utility that compares version numbers using <, which works just fine with the status quo.

data::munge versioning is not compatible with any mechanisms which use its version

... except for the Perl ecosystem that it was developed for.


PS: I like how your website says that CPAN has an "outdated version" when that is upstream (CPAN is the original release channel).

@jrmarino
Copy link
Contributor Author

I checked all four of these examples with "pkg version -t". All four show the wrong comparison on at least one part of the example. It's a pretty good rebuttal.

@jrmarino
Copy link
Contributor Author

That being said -- if all these package managers ave the same behavior, does that mean perl modules have been problematic for years? is this a long-standing well-known issue that package system users simple live with, knowing that their system will not properly detect when newer versons are available?

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 25, 2023

Have fun telling every single Perl module author and the language itself that they need to change what they have been doing since about 1990

There's no need to, as overwhelming majority of authors are using compatible versions (1). In fact, this is the first report of perl module version problem I've got in 5 years of repology and ~7k other reports.

You'd have to patch every little deployed piece of code and utility that compares version numbers using <, which works just fine with the status quo.

There's no need to change any code or any rules, as just that compatible versions can/should be used. In this case, for instance, it could've been 0.0970.100.

PS: I like how your website says that CPAN has an "outdated version" when that is upstream (CPAN is the original release channel).

Garbage in - garbage out, obviously. You can see though that in 98% cases CPAN has the newest version (and that only includes modules packaged in some repository) because (1). That's even more than PyPI which does not suffer from any version scheme ambiguities. Not in all cases CPAN or PyPI are actual upstreams, though, so <100% is expected for both.

@jrmarino
Copy link
Contributor Author

jrmarino commented Apr 25, 2023 via email

@mauke
Copy link

mauke commented Apr 25, 2023

PS: I like how your website says that CPAN has an "outdated version" when that is upstream (CPAN is the original release channel).

You can see though that in 98% cases CPAN has the newest version (and that only includes modules packaged in some repository) because (1).

Have you looked at those cases in detail? There's some interesting stuff there. For example:

  • https://repology.org/project/perl:ai-fuzzy/versions lists different versions for CPAN and MetaCPAN. That makes no sense because CPAN is the code archive and MetaCPAN is the friendly web front-end and API provider for CPAN. In particular, repology claims the latest CPAN version of AI::Fuzzy is 0.01, but the latest release 0.05 is on CPAN as https://www.cpan.org/authors/id/T/TS/TSCANLAN/AI-Fuzzy-0.05.tar.gz (otherwise it wouldn't be visible through MetaCPAN). No idea why repology thinks it is missing from CPAN.
  • https://repology.org/project/perl:anyevent-fastping/versions shows AnyEvent::FastPing being outdated everywhere except for AUR. In reality the reverse is true: Just like with Data::Munge, AnyEvent::FastPing version 2.1 supersedes version 2.02.
  • https://repology.org/project/perl:dbix-fetchloop/versions is a similar case: The latest version (0.6) is marked as outdated because an older release (0.41) got packaged for PLD Linux.
  • https://repology.org/project/perl:coro/versions shows Coro being outdated on CPAN: version 6.57 as opposed to 6.570 in most other places. But those are the same number!
  • https://repology.org/project/perl:dbix-easy/versions is particularly confused. There are two different (and unrelated) DBIx::Easy modules: The original one (first released in 1999), latest release at version 0.21 in 2014, and an unofficial release by a different author using the same name, released between 2002 and 2006 (latest version: 1.40). The latter was never recognized as an official release of DBIx::Easy by CPAN and was subsequently deleted by its author. Repology thinks it is a thing because somehow it got packaged for PLD Linux, but using the module description of the first/official DBIx::Easy ("DBIx::Easy is an easy to use DBI interface. Currently the Pg, mSQL, mysql, Sybase, ODBC and XBase drivers are supported.").
  • https://repology.org/project/perl:net-ssh/versions shows Net::SSH being outdated at version 0.09 everywhere, except for NetBSD/pkgsrc, which somehow has version 2.14. However, a closer look reveals that pkgsrc provides Net::SSH::Perl at version 2.14, which is a different (and unrelated) module.

That's just a few random results from the first page.

(And in general, there are lots of cases where a distro version like 1.230 is considered "unknown" by repology because it doesn't match the "official" version of 1.23, even though they are equal numerically.)

unfortunately, the pass of least resistance is to talk to the perl module
authors I think. They aren't going to want their choice of version number
to break package maintenance -- especially when the solution is relatively
easy on their part.

I disagree. I don't think it makes sense to rely on hundreds (or thousands) of module authors to manually work around bugs in automated versioning systems (and treating fractional numbers as multi-component versions is a bug). At some point, someone is going to slip up and things are going to break. Why expend all this effort when you could fix the code?

To do it right, the versioning code would need to support two types of version: The multi-component format (where each part is compared separately as an integer) and the fractional number format (where the whole thing is a single number). If that's not possible, the ingestion code should pad each fractional number with 0s on the right, up to some fixed number of decimal places. That way 1.23, 1.230, and 1.23000 would all map to 1.230000.0 (or similar).

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 25, 2023

Have you looked at those cases in detail?

It is obvious there are other similar cases, my point is that most Perl modules are versioned in a compatible way.

No idea why repology thinks it is missing from CPAN.

Repology just conveys what CPAN reports, like I've said "garbage in - garbage out".

% curl -s http://cpan.org/modules/02packages.details.txt.gz | gunzip | grep 'AI::Fuzzy '
AI::Fuzzy                          0.01  S/SA/SABREN/AI-Fuzzy-0.01.tar.gz

There are plans to drop CPAN in favor of MetaCPAN, but I'm still hesitant, as there are problems in MetaCPAN as well, and Repology can help reveal them.

I disagree. I don't think it makes sense to rely on hundreds (or thousands) of module authors

About 179 of them ATOW. Definitely more sensible than changing all package managers (just Repology supports 86 different packaging infrastructures) and other software which compares versions to do the impossible thing (1).

to manually work around bugs in automated versioning systems (and treating fractional numbers as multi-component versions is a bug)

Interpretation of the version in its conventional sense is definitely not a bug. I won't dispute the right of perl developers to use whatever they define version as, but you can see that both perl developers and consumers of their work benefit from versioning compatible with both worlds, as it allows

  • straightforward packaging (without need for to epochs, or version mangling, or adding extra version components or other uncomely means to avoid versions going backwards)
  • reliable new release tracking which allows keeping packages up to date
  • vulnerability reporting which operate on version ranges, which is critical for security
  • mapping packages and dependencies between distros and other uses

And there are no technical difficulties for using compatible versioning.

To do it right, the versioning code would need to support two types of version

It is not technically possible as there's no way to know which type a given version belongs to (1).

@AMDmi3
Copy link
Member

AMDmi3 commented Apr 25, 2023

And just because 98% shows the correct top version currently doesn't mean there are many earlier versions of the same module there weren't being compared properly.

It does not, but earlier versions and irrelevant for the purposes we're discussing.

@jrmarino
Copy link
Contributor Author

jrmarino commented May 6, 2023

@mauke Are you really not going to re-release as 0.100 or 0.101?
Here's an example of the breakage the current version string caused:
image

I accept it's on the packager community to contact authors individually. Is it really a big deal to pick a version number that works univerally as well as perl? I don't see a technical solution because as AMDmi3 said, there's no way for the verson libraries to distinguish between perl and not-perl if only given the version string.

@mauke
Copy link

mauke commented May 6, 2023

@jrmarino I'm not sure what I'm looking at or what repology is.

My understanding of the situation is that someone has built a process for handling versioned packages, and that process is broken: There are (at least) two incompatible ways of doing versions, the dotted integer format (A) and the fractional number format (B). Treating the latter as if it were the former results in the wrong version being marked as "latest" (which is what's happening here).

To my mind, the proper fix is to change the process so it remembers whether any given version is of type A or B and compares them accordingly. If that's not feasible (too expensive), the process needs to convert all incoming versions into format A instead. But blindly processing format B as if it were format A makes no sense.

I don't see any technical difficulties about this, either: At some point something is ingesting modules from CPAN. We know what type of versions is used on CPAN, so at that point we can mark or convert the version number as appropriate. (What am I missing?)

Asking me to release a new version for no reason (well, no inherent reason, as there are no actual code changes) feels like the wrong way to go about it. You (that's the general "you", not you personally) have built a process that presumably is intended to solve a real-world problem. But the process is faulty: It treats data in format B as if it were format A, leading to unexpected results. And instead of changing the process so it handles the real world better, you try to change the real-world problem so it better conforms to your chosen solution. Isn't that backwards?

I accept it's on the packager community to contact authors individually.

I mean, if you had a working process (like converting incoming CPAN version numbers from format B to format A), you wouldn't have to contact anyone individually. (At least not for this issue.)

Is it really a big deal to pick a version number that works univerally as well as perl?

It's a case of format A vs format B. Neither is universal. ("Universal" would include CPAN, and CPAN contains modules in both format A and format B.)

The thing I'm morally offended by is being asked to manually work around a broken process that I didn't know about and that was imposed on me without asking. If it were just a one-time change, I'd have fewer objections, but this is essentially putting extra requirements on all of my future releases. Why is it suddenly my responsibility to ensure that the process you've chosen to use doesn't break? (Also, claiming that a fix is "not technically possible" seems rather disingenuous. Of course it's possible.)

And there are no technical difficulties for using compatible versioning.

Here "compatible" means "compatible with our broken process". There are no technical difficulties for you because you've externalized the required effort onto other people (i.e. module authors like me). On the other hand, there are no technical difficulties for me if your CPAN package import is fixed.

Repology does not fix stuff which needs to be fixed elsewhere.

... so I think I am justified in saying: I do not (like to) fix stuff that needs to be fixed elsewhere.

@jrmarino
Copy link
Contributor Author

jrmarino commented May 6, 2023

My understanding of the situation is that someone has built a process for handling versioned packages, and that process is broken:

That's not case. First, it's not a single entity. It's pretty much a standard that dozens of mechanisms that compare versions are using. Secondly, while it's tempting to call it broken, it's really not.

Treating the latter as if it were the former results in the wrong version being marked as "latest" (which is what's happening here).

We're getting into semantics, but yes, the two approaches can be incompatible. However, perl versions can fit into the worldview easily (you have to admit, 0.10 and 0.100 would be equivalent in perl versioning, so why not just use the instance the works everywhere?)

To my mind, the proper fix is to change the process so it remembers whether any given version is of type A or B and compares them accordingly.

It's not possible. All you have are two version strings to compare. That's it. No other metadata.

Asking me to release a new version for no reason (well, no inherent reason, as there are no actual code changes) feels like the wrong way to go about it.

What I asking you personally to do is recognize that your unknowing choice of "0.10" is breaking packages everywhere. You didn't know before, but you do now. So now that you know, you could easily fix the world by making a new release. But you're chosing a principled stance (with flawed assumptions) and leaving all packages of your software broken. They either have to fake the version like FreeBSD did, or not get the updates because of a version comparison failure.

The thing I'm morally offended by is being asked to manually work around a broken process that I didn't know about and that was imposed on me without asking

I know you didn't know about it, but you do now. But while you are busy being "offended", countless users are being broken.

I didn't write any of these libraries, but I do know enough to assure you that "the standard" can't handle "0.10" and "0.100" as being equivalent. There is no technical solution.

There may be "hundreds" of perl packages like this, but that doesn't mean they are externally packaged. This is the only one I know about. I'm fine with the requirement to individually ask authors to use versioning that works everywhere.

Do you really want to put out the users of your software? Your principled stance won't go anywhere - mainly because there is no solution.

@mauke
Copy link

mauke commented May 6, 2023

It's not possible. All you have are two version strings to compare. That's it. No other metadata.

Cool, so why not use my second suggestion and convert from format B to format A when packaging CPAN modules?

What I asking you personally to do is recognize that your unknowing choice of "0.10" is breaking packages everywhere.

Not everywhere; only those that didn't convert the version format when they packaged the module.

(Which is a bit like packaging a Windows batch file to be run as a shell script, and then complaining to the author that some of the commands don't work in sh and could they please use only commands that do the same thing in cmd.exe and sh?)

I didn't write any of these libraries, but I do know enough to assure you that "the standard" can't handle "0.10" and "0.100" as being equivalent. There is no technical solution.

[int_part, frac_part] = '1.10'.split('.')
dotted_version = int_part + '.' + frac_part.rstrip('0').ljust(9, '0')
# '1.100000000'

(Plus maybe a sanity check that the padded fractional part doesn't exceed 9 characters¹.)

What makes this approach infeasible?


¹) I chose 9 as the padding amount because it is unlikely for a module version to need more than 9 fractional digits and because the resulting number will always fit in a 32-bit integer, which is nice for C libraries that convert dotted components to integers in order to compare them.

@jrmarino
Copy link
Contributor Author

jrmarino commented May 6, 2023 via email

@mauke
Copy link

mauke commented May 6, 2023

Uh. You're asking me for a favor because your packaging process is broken and you want me to work around it. Calling me a "bad actor" is not the way to make that happen.

@jrmarino
Copy link
Contributor Author

jrmarino commented May 6, 2023

Not "mine". The entire non-perl world (by your definition of broken).

You've spent more time arguing for the sport of it then you would have spent resolving the problem. You're the only one that can resolve it.

It's clear that you have no intention to do so.
If that's accurate, let's just drop the subject. You're aware; that's all I can do.

@jrmarino
Copy link
Contributor Author

jrmarino commented May 6, 2023

as an epilogue, I just checked our repository and nothing depends on data::munge. It must have been required at some point but no longer. It's no impact on us to just retire it.

that's not a threat - just saying.

@Leont
Copy link

Leont commented May 9, 2023

Maybe I should provide some background here, wearing my hat as the current maintainer of version.pm, Perl's official abstraction to deal with this.

For all of these 0.10 is less than 0.097, though. Just to give a few examples:

As far as Perl is concerned, version 0.10 is identical to 0.100 and 0.100.0. This is a side effect of starting out with decimal versions and adding dotted decimals later on. It's a mess for historical reasons.

For repology, dedicated library was written which can handle the most cases meaningfully, and most package managers implement their own version comparison code. For all of these 0.10 is less than 0.097, though. Just to give a few examples:

I'm afraid we're the exception here. Sorry about that.

Until just now, I missed that the issue was the Data::Munge released it as
"0.10" and not "0.100". So the easiest path forward for this
individual software is re-releasing it as "0.100" or perhaps "0.101".

Indeed. In particular the general recommendation are:

  1. The author must not shorten the length of versions of a distribution unless also bumping the major version.
    Failing to follow this rule leads to all sorts of issues for downstreams, repology is not alone at that.

  2. When using decimal versions, the author should use a multiple of 3 digits behind the dot.
    Things don't necessarily break if you deviate, but they may show some surprising behavior.

@Grinnz
Copy link

Grinnz commented May 9, 2023

For more details see my guide to how versions work in Perl: https://blogs.perl.org/users/grinnz/2018/04/a-guide-to-versions-in-perl.html

There is a 0% chance you are going to convince enough CPAN authors to make a difference on that end. If it was possible, we wouldn't have needed this complex abstraction in CPAN's own processes.

@AMDmi3
Copy link
Member

AMDmi3 commented May 10, 2023

There is a 0% chance you are going to convince enough CPAN authors to make a difference on that end. If it was possible, we wouldn't have needed this complex abstraction in CPAN's own processes.

Well I don't see why compatible versions can't be enforced for new releases on CPAN side, but that's Perl community's affair. For outer world, like I've already mentioned, the problem is already quite small as only a tiny fraction of CPAN authors [of modules which are packaged at least somewhere] use incompatible versions. But of course it does make sense to enlighten these as well and make the problem yet smaller. What I don't understand is the opposition to the compatible versioning, which makes life easier for many people and goes hand-in-hand with the very idea of publishing code for everyones good.

@Leont
Copy link

Leont commented May 10, 2023

What I don't understand is the opposition to the compatible versioning, which makes life easier for many people and goes hand-in-hand with the very idea of publishing code for everyones good.

It feels to me like the point wasn't necessarily communicated clearly. It can read like "you should change your versioning to keep repology happy", which would indeed not be very convincing by itself. The real reason is "you should change it to prevent almost all distributions from having to do extra work".

Arch and derived have worked around this incrementing the epoch. They would have to do it again the next time this happens.
Fedora and FreeBSD seem to work around this issue by lengthening the version to 0.100 to avoid having to increment the epoch. I imagine most other distros would do the same.

Both solutions work are functional, but do cause extra work for the various distributors. That should be a convincing argument.

@Grinnz
Copy link

Grinnz commented May 17, 2023

There's no opposition to the idea; you will see in my guide that I recommend suggestions like @Leont mentioned to avoid the issue. But ultimately it is up to each individual CPAN author to be aware that the problem even exists.

@Grinnz
Copy link

Grinnz commented May 17, 2023

To address this at the Perl-centric level would have to be done in PAUSE or MetaCPAN - I don't think it's reasonable to reject indexing in PAUSE when running afoul of external version progression, but an indexer warning would be perfectly reasonable and raise awareness. MetaCPAN has no authority as it's just a search engine but could display something for awareness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants